35

I am reading through http://docs.python.org/2/library/re.html. According to this the "r" in pythons re.compile(r' pattern flags') refers the raw string notation :

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

Would it be fair to say then that:

re.compile(r pattern) means that "pattern" is a regex while, re.compile(pattern) means that "pattern" is an exact match?

user1592380
  • 34,265
  • 92
  • 284
  • 515
  • 4
    `r` don't have any relationship with `regex`s. It just eases the pattern's string declaration. – Paulo Bu Jan 14 '14 at 01:21
  • Already great answers below, but just to clarify: the answer is **no** to your last question which I'll paraphrase as *does `re.compile` without the `r` mean an exact match?* – Mike Williamson Jun 20 '18 at 17:26

3 Answers3

57

As @PauloBu stated, the r string prefix is not specifically related to regex's, but to strings generally in Python.

Normal strings use the backslash character as an escape character for special characters (like newlines):

>>> print('this is \n a test')
this is 
 a test

The r prefix tells the interpreter not to do this:

>>> print(r'this is \n a test')
this is \n a test
>>> 

This is important in regular expressions, as you need the backslash to make it to the re module intact - in particular, \b matches empty string specifically at the start and end of a word. re expects the string \b, however normal string interpretation '\b' is converted to the ASCII backspace character, so you need to either explicitly escape the backslash ('\\b'), or tell python it is a raw string (r'\b').

>>> import re
>>> re.findall('\b', 'test') # the backslash gets consumed by the python string interpreter
[]
>>> re.findall('\\b', 'test') # backslash is explicitly escaped and is passed through to re module
['', '']
>>> re.findall(r'\b', 'test') # often this syntax is easier
['', '']
Peter Gibson
  • 19,086
  • 7
  • 60
  • 64
9

No, as the documentation pasted in explains the r prefix to a string indicates that the string is a raw string.

Because of the collisions between Python escaping of characters and regex escaping, both of which use the back-slash \ character, raw strings provide a way to indicate to python that you want an unescaped string.

Examine the following:

>>> "\n"
'\n'
>>> r"\n"
'\\n'
>>> print "\n"


>>> print r"\n"
\n

Prefixing with an r merely indicates to the string that backslashes \ should be treated literally and not as escape characters for python.

This is helpful, when for example you are searching on a word boundry. The regex for this is \b, however to capture this in a Python string, I'd need to use "\\b" as the pattern. Instead, I can use the raw string: r"\b" to pattern match on.

This becomes especially handy when trying to find a literal backslash in regex. To match a backslash in regex I need to use the pattern \\, to escape this in python means I need to escape each slash and the pattern becomes "\\\\", or the much simpler r"\\".

As you can guess in longer and more complex regexes, the extra slashes can get confusing, so raw strings are generally considered the way to go.

Community
  • 1
  • 1
2

No. Not everything in regex syntax needs to be preceded by \, so ., *, +, etc still have special meaning in a pattern

The r'' is often used as a convenience for regex that do need a lot of \ as it prevents the clutter of doubling up the \

John La Rooy
  • 295,403
  • 53
  • 369
  • 502