-2

I am new to Python. Can someone tell me what is the difference between these two regex statements (re.findall(r"\d+","i am aged 35")) and (re.findall("\d+","i am aged 35")).

I had the understanding that the raw string in the first statement will make "\d+" inactive because that is the primarily role of a raw string - to make escape characters inactive. In other words "\d+" will not be a meta character for finding/searching/matching digits if a raw string is used. However, I now see that both statements return the same result.

1 Answers1

1

Both the Python parser and the regular expression parser handle escape sequences. This means that any escape sequence that both engines support must either use double slashes, or you use a raw string literal so the Python parser doesn't try to interpret escape sequences.

In this case, \d has no meaning to Python, so the backslash is left in place for the re module to handle. So here specifically, there is no difference between the two snippets.

However, if you needed to match a literal backslash before other text like section in your regular expression, without raw strings, you'd have to use '\\\\section' to define the pattern! That's because the Python interpreter would see '\\section' as an escape sequence producing a single backslash, and then the regular expression parser sees the start of the escape sequence \s.

See the section on backslashes and raw string literals in the Python regular expression HOWTO.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Hi There, Thank you. I am still unable to understand why would i need to use '\\\\' to match a literal backslash in a string. – Pankaj Kulkarni Apr 19 '18 at 18:09
  • @PankajKulkarni: `'\section'` is not a valid Python escape, so you get `'\section'` in the regex, which sees `\s`, a metacharacter. `'\\section'` becomes a single `\` in the value, so `'\section'` is passed to the regex engine, which still sees `\s`, a meta-character. Only `'\\\\section'` becomes the value `\\section`, so now the regex engine sees `\\` as a literal backslash separate from the `section` part. – Martijn Pieters Apr 19 '18 at 20:08