0

I'm a little confused by raw-string search with Python regex. My code snippet goes like this:

import re
text = r"aaa\r\nbbb\r\nccc\r\n"    # I am going to replace '\r\n' into '\n'
text1 = re.sub(r"\r\n", "\n", text)    # but, text1 remains identical with text
text2 = re.sub(r"\\r\\n", "\n", text)  # while text2 get what I expected

Per my understanding, text is stored in memory like "aaa\\r\\nbbb\\r\\nccc", in which "\\" is a single char of '\'. So text1 should have worked as I wanted, because r"\r\n" will be like "\\r\\n" in memory and match with text. However, it is not true.

What is wrong with my understanding about raw-string in this situation? Thanks.

Cuteufo
  • 501
  • 6
  • 15
  • Regex escape sequence `\r` matches CR, `\n` matches LF chars. You can match them either with literal CR/LF or regex escapes. – Wiktor Stribiżew Jun 09 '20 at 21:37
  • `text = r"aaa\r\nbbb\r\nccc\r\n"` is literal text `aaa\r\nbbb\r\nccc\r\n` in memory. With regex, to match an literal escape \ plus r it needs to be \\ plus r that the regex engine sees. That is because the escape is a special character to regex engines. So for it to match a literal \ you tell the engine to match a \\. All aforementioned has nothing to do with the language string parser. –  Jun 09 '20 at 21:54

0 Answers0