7.23. Regex Backreference
\g<number>
- backreferencing by position\g<name>
- backreferencing by name(?P=name)
- backreferencing by nameNote, that for backreference, must use raw-sting or double backslash
7.23.1. SetUp
>>> import re
7.23.2. Backreference
>>> string = '<p>Hello World</p>'
>>> pattern = r'<(?P<tag>.+)>(?:.+)</(?P=tag)>'
>>>
>>> re.findall(pattern, string)
['p']
7.23.3. Recall Group by Position
\g<number>
- backreferencing by position
>>> string = '<p>Hello World</p>'
>>> pattern = r'<p>(.+)</p>'
>>> replace = r'<strong>\g<1></strong>'
>>>
>>> re.sub(pattern, replace, string)
'<strong>Hello World</strong>'
7.23.4. Recall Group by Name
\g<name>
- backreferencing by name
>>> string = '<p>Hello World</p>'
>>> pattern = r'<p>(?P<text>.+)</p>'
>>> replace = r'<strong>\g<text></strong>'
>>>
>>> re.sub(pattern, replace, string)
'<strong>Hello World</strong>'
7.23.5. Example
(?P<tag><.*?>).+(?P=tag)
- matches string inside of a<tag>
(opening and closing tag is the same)
7.23.6. Use Case - 1
>>> import re
>>>
>>> string = 'On Sun, Jan 1st, 2000 at 12:00 AM Alice <alice@example.com> wrote'
>>>
>>> year = r'(?P<year>\d{4})'
>>> month = r'(?P<month>[A-Z][a-z]{2})'
>>> day = r'(?P<day>\d{1,2})'
>>> pattern = f'{month} {day}(?:st|nd|rd|th), {year}'
Recall group by position:
>>> replace = r'\g<3> \g<1> \g<2>'
>>>
>>> re.sub(pattern, replace, string)
'On Sun, 2000 Jan 1 at 12:00 AM Alice <alice@example.com> wrote'
Recall group by name:
>>> replace = r'\g<year> \g<month> \g<day>'
>>>
>>> re.sub(pattern, replace, string)
'On Sun, 2000 Jan 1 at 12:00 AM Alice <alice@example.com> wrote'
7.23.7. Use Case - 2
>>> import re
>>>
>>>
>>> string = '<p>Hello World</p>'
>>> pattern = r'<(?P<tag>.*?)>(.*?)</(?P=tag)>'
>>>
>>> re.findall(pattern, string)
[('p', 'Hello World')]
7.23.8. Use Case - 3
>>> import re
>>>
>>>
>>> string = '<p>We choose to go to the <strong>Moon</strong></p>'
>>>
>>> pattern = r'<(?P<tagname>[a-z]+)>.*</(?P=tagname)>'
>>> re.findall(pattern, string)
['p']
>>>
>>> pattern = r'<(?P<tagname>[a-z]+)>(.*)</(?P=tagname)>'
>>> re.findall(pattern, string)
[('p', 'We choose to go to the <strong>Moon</strong>')]