7.24. Regex Flag
re.ASCII- perform ASCII-only matching instead of full Unicode matchingre.IGNORECASE- case-insensitive searchre.MULTILINE- match can start in one line, and end in anotherre.DOTALL- dot (.) matches also newline charactersre.UNICODE- turns on unicode character support for\wre.DEBUG- display debugging information during pattern compilation
The final piece of regex syntax that Python's regular expression engine offers
is a means of setting the flags. Usually the flags are set by passing them as
additional parameters when calling the re.compile() function, but sometimes
it's more convenient to set them as part of the regex itself. The syntax is
simply (?flags) where flags is one or more of the following:
re.ASCIIre.IGNORECASEre.LOCALEre.MULTILINEre.DOTALLre.UNICODEre.DEBUG
If the flags are set this way, they should be put at the start of the regex; they match nothing, so their effect on the regex is only to set the flags. The letters used for the flags are the same as the ones used by Perl's regex engine, which is why s is used for re.DOTALL [1].
7.24.1. SetUp
>>> import re
7.24.2. IGNORECASE
Short:
iLong:
re.IGNORECASECase-insensitive search
Has Unicode support i.e.
Ąandą
>>> string = 'Hello World'
>>>
>>> re.findall('[a-z]', string)
['e', 'l', 'l', 'o', 'o', 'r', 'l', 'd']
>>>
>>> re.findall('[a-z]', string, flags=re.IGNORECASE)
['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']
7.24.3. UNICODE
Short:
uLong:
re.UNICODEOn by default
Turns on unicode character support
Works for
\wand\W
>>> string = 'cześć' # in Polish language means hello
>>>
>>> re.findall(r'\w', string)
['c', 'z', 'e', 'ś', 'ć']
>>>
>>> re.findall(r'\w', string, flags=re.UNICODE)
['c', 'z', 'e', 'ś', 'ć']
Mind that range character class [a-z] is always ASCII:
>>> re.findall(r'[a-z]', string)
['c', 'z', 'e']
>>>
>>> re.findall(r'[a-z]', string, flags=re.UNICODE)
['c', 'z', 'e']
7.24.4. ASCII
Short:
aLong:
re.ASCIIPerform ASCII-only matching instead of full Unicode matching
Works for
\w,\W,\b,\B,\d,\D,\sand\SASCII only search is faster, but does not include unicode characters
>>> string = 'cześć' # 'hello' in Polish
>>> re.findall(r'\w', string)
['c', 'z', 'e', 'ś', 'ć']
>>>
>>> re.findall(r'\w', string, flags=re.ASCII)
['c', 'z', 'e']
Mind that range character class [a-z] is always ASCII:
>>> string = 'cześć' # 'hello' in Polish
>>>
>>> re.findall(r'[a-z]', string)
['c', 'z', 'e']
>>>
>>> re.findall(r'[a-z]', string, flags=re.ASCII)
['c', 'z', 'e']
7.24.5. MULTILINE
Short:
mLong:
re.MULTILINEMatch can start in one line, and end in another
Changes meaning of
^, now it is a start of a lineChanges meaning of
$, now it is an end of line
>>> string = 'Hello\nWorld'
>>>
>>> re.findall('^[A-Z]', string)
['H']
>>>
>>> re.findall('^[A-Z]', string, flags=re.MULTILINE)
['H', 'W']
Content of a string variable depends on re.MULTILINE flag.
Without flag:
Hello\nWorld
With flag:
Hello
World
7.24.6. DOTALL
Short:
sLong:
re.DOTALLDot (
.) matches also newline charactersBy default newlines are not matched by
.
>>> string = 'Hello\nWorld'
>>>
>>> re.findall(r'.', string)
['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']
>>>
>>> re.findall(r'.', string, flags=re.DOTALL)
['H', 'e', 'l', 'l', 'o', '\n', 'W', 'o', 'r', 'l', 'd']
Mind the \n character among results with re.DOTALL flag turned on.
7.24.7. DEBUG
Long:
re.DEBUGDisplay debugging information during pattern compilation
>>> x = re.compile('^[a-z]+@example.com$', flags=re.DEBUG)
AT AT_BEGINNING
MAX_REPEAT 1 MAXREPEAT
IN
RANGE (97, 122)
LITERAL 64
LITERAL 101
LITERAL 120
LITERAL 97
LITERAL 109
LITERAL 112
LITERAL 108
LITERAL 101
ANY None
LITERAL 99
LITERAL 111
LITERAL 109
AT AT_END
0. INFO 4 0b0 13 MAXREPEAT (to 5)
5: AT BEGINNING
7. REPEAT_ONE 10 1 MAXREPEAT (to 18)
11. IN 5 (to 17)
13. RANGE 0x61 0x7a ('a'-'z')
16. FAILURE
17: SUCCESS
18: LITERAL 0x40 ('@')
20. LITERAL 0x65 ('e')
22. LITERAL 0x78 ('x')
24. LITERAL 0x61 ('a')
26. LITERAL 0x6d ('m')
28. LITERAL 0x70 ('p')
30. LITERAL 0x6c ('l')
32. LITERAL 0x65 ('e')
34. ANY
35. LITERAL 0x63 ('c')
37. LITERAL 0x6f ('o')
39. LITERAL 0x6d ('m')
41. AT END
43. SUCCESS