List of escape sequences available in Python 3
Those are the escapes interpolated when parsing a string in Python.
All other escaped items are ignored.
So, if you give it a string like '(\d{1,3})\.\1'
it interpolates the \1
as a character with an octal value of 1.
\ooo Character with octal value ooo
So this is what you get
>>> import re
>>> ipAddressString = "192.192.10.5/24"
>>> hh = re.search('(\d{1,3})\.\1',ipAddressString)
>>> print (hh)
None
>>> print ('(\d{1,3})\.\1')
(\d{1,3})\.☺
The regex engine sees this (\d{1,3})\.☺
which is not an error
but it doesn't match what you want.
Ways around this:
- Escape the escape on the octal
'(\d{1,3})\.\\1'
- Make the string a raw string with syntax
either a raw double r"(\d{1,3})\.\1"
or a raw single r'(\d{1,3})\.\1'
Using the first method we get:
>>> import re
>>> ipAddressString = "192.192.10.5/24"
>>> hh = re.search('(\d{1,3})\.\\1',ipAddressString)
>>> print (hh)
<re.Match object; span=(0, 7), match='192.192'>
>>> print ('(\d{1,3})\.\\1')
(\d{1,3})\.\1
Just a side note, most regex engines also recognize octal sequences.
But to differentiate an octal from a back reference it usually requires a leading \0
then a 2 or 3 digit octal \0000-\0377
for example, but sometimes it doesn't and will accept both.
Thus, there is a gray area of overlap.
Some engines will mark the back reference (example \2
) when it finds
an ambiguity, then when finished parsing the regex, revisit the item
and mark it as a back reference if the group exists, or an octal
if it doesn't. Perl is famous for this.
In general, each engine handles the issue of octal vs back reference
in it's own bizarre way. Its always a gotcha waiting to happen.