7.1. Regex About
Also known as:
re
,regex
,regexp
,Regular Expressions
W3C HTML5 Standard [4] regexp for email field:
>>> pattern = r"^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$"
7.1.1. SetUp
string = 'On Sun, Jan 1st, 2000 at 12:00 AM Alice <alice@example.com> wrote'
string
is shortstring
has namestring
has datestring
has timestring
has punctuation (,
,:
)string
has digits and numbersstring
has ordinals (th) - from st, nd, rd, thstring
has lowercase and uppercase lettersstring
has email addressstring
has conjunctions (from, on, at)
>>> import re
>>> string = 'On Sun, Jan 1st, 2000 at 12:00 AM Alice <alice@example.com> wrote'
>>> STRING = """Apollo 11 was the American spaceflight that first landed
... humans on the Moon. Commander (CDR) Neil Armstrong and lunar module
... pilot (LMP) Buzz Aldrin landed the Apollo Lunar Module (LM) Eagle on
... July 20th, 1969 at 20:17 UTC, and Armstrong became the first person
... to step (EVA) onto the Moon's surface (EVA) 6 hours 39 minutes later,
... on July 21st, 1969 at 02:56:15 UTC. Aldrin joined him 19 minutes later.
... They spent 2 hours 31 minutes exploring the site they had named Tranquility
... Base upon landing. Armstrong and Aldrin collected 47.5 pounds (21.5 kg)
... of lunar material to bring back to Earth as pilot Michael Collins (CMP)
... flew the Command Module (CM) Columbia in lunar orbit, and were on the
... Moon's surface for 21 hours 36 minutes before lifting off to rejoin
... Columbia."""
7.1.2. Python
import re
re.findall()
- find all occurrences of pattern in string, returnslist[str]
re.finditer()
- find first occurrence of pattern in string, returnsIterator[re.Match]
re.search()
- find first occurrence of pattern in string, returnsre.Match
(stops after first match)re.match()
- check if string matches pattern, used in validation: phone, email, tax id, etc., returnsre.Match
re.compile()
- compile pattern into object for further use, for example in the loop, returnsre.Pattern
re.split()
- split string by pattern, returnslist[str]
re.sub()
- substitute pattern in string with something else, returnsstr
7.1.3. Syntax
Character Class - what to find (single character)
Qualifiers - range to find (range)
Negation
Quantifiers - how many occurrences of preceding qualifier or character class
Groups
Look Ahead and Look Behind
Flags
Extensions
[]
- Qualifier{}
- Quantifier()
- Groups
7.1.4. Under the Hood
ASCII table
chr()
ord()
re.DEBUG
>>> chr(97)
'a'
>>> ord('a')
97
>>> string = 'On Sun, Jan 1st, 2000 at 12:00 AM Alice <alice@example.com> wrote'
>>>
>>> [ord(x) for x in string]
[79, 110, 32, 83, 117, 110, 44, 32, 74, 97, 110, 32, 49, 115, 116, 44, 32, 50, 48, 48, 48, 32, 97, 116, 32, 49, 50, 58, 48, 48, 32, 65, 77, 32, 65, 108, 105, 99, 101, 32, 60, 97, 108, 105, 99, 101, 64, 101, 120, 97, 109, 112, 108, 101, 46, 99, 111, 109, 62, 32, 119, 114, 111, 116, 101]
>>> string = 'On Sun, Jan 1st, 2000 at 12:00 AM Alice <alice@example.com> wrote'
>>> pattern = r'a'
>>>
>>> re.findall(pattern, string, flags=re.DEBUG)
LITERAL 97
0. INFO 8 0b11 1 1 (to 9)
prefix_skip 1
prefix [0x61] ('a')
overlap [0]
9: LITERAL 0x61 ('a')
11. SUCCESS
['a', 'a', 'a', 'a']
7.1.5. Visualization

Figure 7.1. Visualization for pattern r'^[a-zA-Z0-9][\w.+-]*@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]{2,20}$'
[1]
7.1.6. Further Reading
https://www.youtube.com/watch?v=BmF-gEYXWVM&list=PLv4THqSPE6meFeo_jNLgUVKkP40UstIQv&index=3
Kinsley, Harrison "Sentdex". Python 3 Programming Tutorial - Regular Expressions / Regex with re. Year: 2014. Retrieved: 2021-04-11. URL: https://www.youtube.com/watch?v=sZyAn2TW7GY
https://www.rexegg.com/regex-trick-conditional-replacement.html