Regex not correctly replacing expected result python

Question

I have a dictionary of words I want to replace.

preprocess_pattern = {r" AND ": r" & ",
 r" O\A ": r" O/A ",
 r" D\B ": r" O/A ",
 r" D/B ": r" O/A "}

def preprocess_rules(text):

    for detect_pattern, replace_pattern in preprocess_pattern .items():
        text = re.sub(detect_pattern, replace_pattern, str(text))
        
    return text

preprocess_rules('AMAZON O\A MICROSOFT')

It gives me a result of 'AMAZON O\A MICROSOFT'; with two slashes(). The O\A didn't replace to O/A. Was wondering what is causing this issue.

Why use regex here if you are not using regular expressions? Do not use `re.sub`, use `.replace`. — Wiktor Stribiżew, Nov 23 '20 at 19:27

Dani Mesejo · Accepted Answer · 2020-11-23T18:40:32.690

The \ is a metacharacter so you need to escape detect_pattern using re.escape:

import re

preprocess_pattern = {r" AND ": r" & ",
                      r" O\A ": r" O/A ",
                      r" D\B ": r" O/A ",
                      r" D/B ": r" O/A "}


def preprocess_rules(text):
    for detect_pattern, replace_pattern in preprocess_pattern.items():
        text = re.sub(re.escape(detect_pattern), replace_pattern, text)
    return text


res = preprocess_rules('AMAZON O\A MICROSOFT')
print(res)

Output

AMAZON O/A MICROSOFT

From the documentation:

Escape special characters in pattern. This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

Just what I was looking for. Thanks! :) – mathgeek Nov 23 '20 at 18:44 — mathgeek, Nov 23 '20 at 18:44

score 0 · Answer 2 · answered Nov 23 '20 at 18:37

Character \ is the "escape" character in a regex. For example:

. matches everything
\. matches literal dot

Thus in your regex, you are using O\A which means: O followed by literal A and this is why it does not match/replace.

Now, to match the character \ you need to escape it with itself! Replacing O\A with O\\A will work since it matches:

O
\\: literal \
A

Note that you need to do the same for D\B:

preprocess_pattern = {
 r" AND ": r" & ",
 r" O\\A ": r" O/A ",
 r" D\\B ": r" O/A ",
 r" D/B ": r" O/A "
}

Regex not correctly replacing expected result python

2 Answers2