-1

I am facing some strange python regex issue. The following two strings are supposedly to be exactly the same. But somehow they are not matching.

import re
print(" \\\"")
print(" "+chr(92)+chr(34)+"")
print(re.search(" \\\"", " "+chr(92)+chr(34)+""))

However, the following does match

import re
print("\\\"")
print(""+chr(92)+chr(34)+"")
print(re.search("\\\"", ""+chr(92)+chr(34)+""))

Any thought on what is going on here?

Qiang Li
  • 10,593
  • 21
  • 77
  • 148

1 Answers1

4

Issue is the backslash character has special meaning to a string in python. You can use a Python raw string created by prefixing a string literal with 'r' or 'R' where python raw string treats backslash (\) as a literal character.

import re
print(" \\\"")
print(" "+chr(92)+chr(34)+"")
print(re.search(r" \\\"", " "+chr(92)+chr(34)+""))

Output:

 \"
 \"
<re.Match object; span=(0, 3), match=' \\"'>

In second example print(re.search("\\\"", ""+chr(92)+chr(34)+"")) outputs: <re.Match object; span=(1, 2), match='"'> where only the double quote is matched.

Need to escape the backslash or use a raw string. If use single-quotes around the regexp then the double-quote does not need to be escaped.

s = "" + chr(92) + chr(34) + ""
print(re.search("\\\\\"", s))
print(re.search(r"\\\"", s))
print(re.search(r'\\"', s))

Output:

<re.Match object; span=(0, 2), match='\\"'>
<re.Match object; span=(0, 2), match='\\"'>
<re.Match object; span=(0, 2), match='\\"'>

For further details on raw string and backslash in Python, see answers for this question.

CodeMonkey
  • 22,825
  • 4
  • 35
  • 75