1

I have a dictionary of words I want to replace.

preprocess_pattern = {r" AND ": r" & ",
 r" O\A ": r" O/A ",
 r" D\B ": r" O/A ",
 r" D/B ": r" O/A "}

def preprocess_rules(text):

    for detect_pattern, replace_pattern in preprocess_pattern .items():
        text = re.sub(detect_pattern, replace_pattern, str(text))
        
    return text

preprocess_rules('AMAZON O\A MICROSOFT')

It gives me a result of 'AMAZON O\A MICROSOFT'; with two slashes(). The O\A didn't replace to O/A. Was wondering what is causing this issue.

mathgeek
  • 125
  • 7

2 Answers2

1

The \ is a metacharacter so you need to escape detect_pattern using re.escape:

import re

preprocess_pattern = {r" AND ": r" & ",
                      r" O\A ": r" O/A ",
                      r" D\B ": r" O/A ",
                      r" D/B ": r" O/A "}


def preprocess_rules(text):
    for detect_pattern, replace_pattern in preprocess_pattern.items():
        text = re.sub(re.escape(detect_pattern), replace_pattern, text)
    return text


res = preprocess_rules('AMAZON O\A MICROSOFT')
print(res)

Output

AMAZON O/A MICROSOFT

From the documentation:

Escape special characters in pattern. This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
0

Character \ is the "escape" character in a regex. For example:

  • . matches everything
  • \. matches literal dot

Thus in your regex, you are using O\A which means: O followed by literal A and this is why it does not match/replace.

Now, to match the character \ you need to escape it with itself! Replacing O\A with O\\A will work since it matches:

  • O
  • \\: literal \
  • A

Note that you need to do the same for D\B:

preprocess_pattern = {
 r" AND ": r" & ",
 r" O\\A ": r" O/A ",
 r" D\\B ": r" O/A ",
 r" D/B ": r" O/A "
}
urban
  • 5,392
  • 3
  • 19
  • 45