7.1. Regex About

Also known as: re, regex, regexp, Regular Expressions

W3C HTML5 Standard [4] regexp for email field:

>>> pattern = r"^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$"

7.1.1. SetUp

string = 'On Sun, Jan 1st, 2000 at 12:00 AM Alice <alice@example.com> wrote'
string is short
string has name
string has date
string has time
string has punctuation (,, :)
string has digits and numbers
string has ordinals (th) - from st, nd, rd, th
string has lowercase and uppercase letters
string has email address
string has conjunctions (from, on, at)

>>> import re

>>> string = 'On Sun, Jan 1st, 2000 at 12:00 AM Alice <alice@example.com> wrote'

>>> STRING = """Apollo 11 was the American spaceflight that first landed
... humans on the Moon. Commander (CDR) Neil Armstrong and lunar module
... pilot (LMP) Buzz Aldrin landed the Apollo Lunar Module (LM) Eagle on
... July 20th, 1969 at 20:17 UTC, and Armstrong became the first person
... to step (EVA) onto the Moon's surface (EVA) 6 hours 39 minutes later,
... on July 21st, 1969 at 02:56:15 UTC. Aldrin joined him 19 minutes later.
... They spent 2 hours 31 minutes exploring the site they had named Tranquility
... Base upon landing. Armstrong and Aldrin collected 47.5 pounds (21.5 kg)
... of lunar material to bring back to Earth as pilot Michael Collins (CMP)
... flew the Command Module (CM) Columbia in lunar orbit, and were on the
... Moon's surface for 21 hours 36 minutes before lifting off to rejoin
... Columbia."""

7.1.2. Python

import re
re.findall() - find all occurrences of pattern in string, returns list[str]
re.finditer() - find first occurrence of pattern in string, returns Iterator[re.Match]
re.search() - find first occurrence of pattern in string, returns re.Match (stops after first match)
re.match() - check if string matches pattern, used in validation: phone, email, tax id, etc., returns re.Match
re.compile() - compile pattern into object for further use, for example in the loop, returns re.Pattern
re.split() - split string by pattern, returns list[str]
re.sub() - substitute pattern in string with something else, returns str

7.1.3. Syntax

Character Class - what to find (single character)
Qualifiers - range to find (range)
Negation
Quantifiers - how many occurrences of preceding qualifier or character class
Groups
Look Ahead and Look Behind
Flags
Extensions
[] - Qualifier
{} - Quantifier
() - Groups

7.1.4. Under the Hood

ASCII table
chr()
ord()
re.DEBUG

>>> chr(97)
'a'

>>> ord('a')
97

>>> string = 'On Sun, Jan 1st, 2000 at 12:00 AM Alice <alice@example.com> wrote'
>>>
>>> [ord(x) for x in string]
[79, 110, 32, 83, 117, 110, 44, 32, 74, 97, 110, 32, 49, 115, 116, 44, 32, 50, 48, 48, 48, 32, 97, 116, 32, 49, 50, 58, 48, 48, 32, 65, 77, 32, 65, 108, 105, 99, 101, 32, 60, 97, 108, 105, 99, 101, 64, 101, 120, 97, 109, 112, 108, 101, 46, 99, 111, 109, 62, 32, 119, 114, 111, 116, 101]

>>> string = 'On Sun, Jan 1st, 2000 at 12:00 AM Alice <alice@example.com> wrote'
>>> pattern = r'a'
>>>
>>> re.findall(pattern, string, flags=re.DEBUG)
LITERAL 97

 0. INFO 8 0b11 1 1 (to 9)
      prefix_skip 1
      prefix [0x61] ('a')
      overlap [0]
 9: LITERAL 0x61 ('a')
11. SUCCESS
['a', 'a', 'a', 'a']

7.1.5. Visualization

../../_images/regexp-visualization.png — Figure 7.1. Visualization for pattern `r'^[a-zA-Z0-9][\w.+-]*@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,20}$'` [1]

7.1.6. Further Reading

https://www.youtube.com/watch?v=BmF-gEYXWVM&list=PLv4THqSPE6meFeo_jNLgUVKkP40UstIQv&index=3
Kinsley, Harrison "Sentdex". Python 3 Programming Tutorial - Regular Expressions / Regex with re. Year: 2014. Retrieved: 2021-04-11. URL: https://www.youtube.com/watch?v=sZyAn2TW7GY
https://www.rexegg.com/regex-trick-conditional-replacement.html
https://www.rexegg.com/regex-lookarounds.html
https://www.rexegg.com/regex-anchors.html#z

7.1. Regex About

7.1.1. SetUp

7.1.2. Python

7.1.3. Syntax

7.1.4. Under the Hood

7.1.5. Visualization

7.1.6. Further Reading

7.1.7. References