2

How should I declare a regular expression?

mergedData = re.sub(r'\$(.*?)\$', readFile, allData)

I'm kind of wondering why this worked. I thought that I need to use the r'' to pass a regular expression.

mergedData = re.sub("\$(.*?)\$", readFile, allData)

What does "\$" result in in this case? Why? I would have thought "$".

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
historystamp
  • 1,418
  • 4
  • 14
  • 24

3 Answers3

6

I thought that I need to user the r'' to pass a regular expression.

r before a string literal indicates raw string, which means the usual escape sequences such as \n or \r are no longer treated as new line character or carriage return, but simply \ followed by n or r. To specify a \, you only need \ in raw string literal, while you need to double it up \\ in normal string literal. This is why it is usually the case that raw string is used in specifying regular expression1. It reduces the confusion when reading the code. You would have to do escaping twice if you use normal string literal: once for the normal string literal escape and the second time for the escaping in regex.

What does "\$" result in this case? Why? I would have thought "$"

In Python normal string literal, if \ is not followed by an escape sequence, the \ is preserved. Therefore "\$" results in \ followed by $.

This behavior is slightly different from the way C/C++ or JavaScript handle similar situation: the \ is considered escape for the next character, and only the next character remains. So "\$" in those languages will be interpreted as $.

Footnote

1: There is a small defect with the design of raw string in Python, though: Why can't Python's raw string literals end with a single backslash?

Community
  • 1
  • 1
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
3

The r'...' escapes sequences like '\1' (reference to first group in a regular expression, but the same as '\x01' if not escaped).

Generally speaking in r'...' the backslash won't behave as an escape character.

Try

 re.split('(.).\1', '1x2x3')  # ['1x2x3']

vs.

 re.split(r'(.).\1', '1x2x3') # ['1', 'x', '3']

As '\$' is not an escape sequence in python, it is literally the same as '\\$'.

Kijewski
  • 25,517
  • 12
  • 101
  • 143
1

Just ask the snake:

>>> r'\$(.*?)\$'=='\$(.*?)\$'
True
>>> r'\vert'=='\vert'
False
>>> r'\123'=='\123'
False
>>> r'\#23'=='\#23'
True

Basically if \x would create an esacped character in C, using r in a string prefix is the same as \\x:

>>> r'\123'=='\\123'
True
>>> r'\tab'=='\\tab'
True
the wolf
  • 34,510
  • 13
  • 53
  • 71