79

Why is pep8 complaining on the next string in the code?

import re
re.compile("\d{3}")

The warning I receive:

ID:W1401  Anomalous backslash in string: '\d'. String constant might be missing an r prefix.

Can you explain what is the meaning of the message? What do I need to change in the code so that the warning W1401 is passed?

The code passes the tests and runs as expected. Moreover \d{3} is a valid regex.

falsetru
  • 357,413
  • 63
  • 732
  • 636
alandarev
  • 8,349
  • 2
  • 34
  • 43

2 Answers2

114

"\d" is same as "\\d" because there's no escape sequence for d. But it is not clear for the reader of the code.

But, consider \t. "\t" represent tab chracter, while r"\t" represent literal \ and t character.

So use raw string when you mean literal \ and d:

re.compile(r"\d{3}")

or escape backslash explicitly:

re.compile("\\d{3}")
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • Thanks, I had no idea about the prefix 'r' and its functionality. – alandarev Sep 26 '13 at 14:36
  • 7
    Coming late, but `\d` is not at all the same as `\\d`. The former matches any (Unicode) digit; the latter matches a backslash followed by `d`. They are not equivalent. Pylint seems to be in the wrong here. – Marek Jedliński May 03 '17 at 09:48
  • 5
    @moodforaday, Try `'\d' == '\\d'` in python interactive shell. Also `'\t' == '\\t'` – falsetru May 03 '17 at 10:50
  • 1
    @MarekJedlińsk You are talking about what happens once the string is passed to the regular expression itself, but the linter is talking about strings by themselves, in any context whatsoever. – Cornelius Roemer May 22 '20 at 16:02
5

Python is unable to parse '\d' as an escape sequence, that's why it produces a warning.

After that it's passed down to regex parser literally, works fine as an E.S. for regex.

userA789
  • 425
  • 2
  • 6
  • 17
  • This answer helped me a lot! We're talking about two different kinds of escape sequences here: 1) for Python strings and 2) for regexes. People coming to this question will be aware of the second meaning, but not the first. But the first is complained about by the linter. The linter is trying to check whether you really meant for that `d` to be a `d` or whether it's not mistyped. To be on the safe side, it assumes `\` is always doubly escaped if it's meant. That way, any real mistake is noticed. – Cornelius Roemer May 22 '20 at 16:10