The backslash character in Regex for Python

Question

In the Python documentation for Regex, the author mentions:

regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python’s usage of the same character for the same purpose in string literals.

He then goes on to give an example of matching \section in a regex:

to match a literal backslash, one has to write '\\' as the RE string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal. In REs that feature backslashes repeatedly, this leads to lots of repeated backslashes and makes the resulting strings difficult to understand.

He then says that the solution to this "backslash plague" is to begin a string with r to turn it into a raw string.

Later though, he gives this example of using Regex:

p = re.compile('\d+')
p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')

which results in:

['12', '11', '10']

I am confused as to why we did not need to include an r in this case before '\d+'. I thought, based on the previous explanations of backslash, that we'd need to tell Python that the backslash in this string is not the backslash that it knows.

You shouldn't have used `'\d'`, it should be `'\\d'`. Remember, `\ ` is an escape character in Python strings, and the only reason that worked is that `\d` isn't a recognized escape, so it treated the `\ ` like an ordinary character, but it's reckless and prone to breaking in the future. — Tom Karzes, Apr 10 '20 at 16:54
Note that OP did not write that code, but just noticed the inconsistency. — Arne, Apr 10 '20 at 17:02
You can also do `p.pattern` which gives `'\\d+'` showing that in this case, the escape gives the intended result, but that's not always true for all escape sequences. Best practice is to use raw strings for all regexes. — ggorlen, Apr 10 '20 at 17:32

score 3 · Accepted Answer · answered Apr 10 '20 at 16:58

Python only recognizes some sequences starting with \ as escape sequences. For example \d is not a known escape sequence so for this particular case there is no need to escape the backslah to keep it there.

(In Python 3.6) "\d" and "\\d" are equivalent:

>>> "\d" == "\\d"
True
>>> r"\d" == "\\d"
True

Here is a list of all the recognized escape sequences: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

The backslash character in Regex for Python

1 Answers1