-1

I had thought the 'r' prefix in the pattern is to make sure that anything in the pattern will be interpreted as string literal, so that I don't have to use escape, but in this case below, I still have to use '.' for literal match. So what's the purpose of the 'r' in the beginning of the regex?

    pattern = r'.'
    
    text = "this is. test"
    
    text = re.sub(pattern, ' ', text)
marlon
  • 6,029
  • 8
  • 42
  • 76
  • On my system, the value of test becomes `' '`, because the regex character `.` matches anything. So `.` isn't being interpreted literally. Is that the intent? – Nick ODell Jul 28 '22 at 02:31
  • The purpose of the "r" is to stop interpretation of backslash escapes. What you need is `pattern = '\\.'` which can be written `pattern = r'\.`'. A regex that allowed only string literals wouldn't be useful. – Tim Roberts Jul 28 '22 at 04:24
  • You are comparing apples with tractors. Character literal and regexp *meaning* of the char are two different things – Marcin Orlowski Jul 28 '22 at 04:44

1 Answers1

1

The r prefix stands for "raw." It means that escape sequences inside a raw string will appear as literal. Consider:

print('Hello\b World')   # Hello World
print(r'Hello\b World')  # Hello\b World

In the first non raw string example, \b is interpreted as a control character (which doesn't get printed). In the second example using a raw string, \b is a literal word boundary.

Another example would be comparing '\1' to r'\1'. In the former, '\1' is a control character, while the latter is the first capture group. Note that to represent the first capture group without using a raw string we can double up backslashes, i.e. use '\\1'.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360