create a Regex that won't replace substrings

Question

I have the following problem:

given a dictionary where the keys are strings which I will find in another string, and the values are the strings that I want to replace the keys with: for example

replace_dict = {"p": "r"}
str = "p"
str = replace(str, replace_dict)
print(str)  # Should output r.

now I have the following code:

pattern = re.compile("|".join(sorted(rep.keys(), key=len, reverse=True)))
ret_string = pattern.sub(lambda m: rep[re.escape(m.group(0))], ret_string)

Now this code does the job, however it has one bug: it replaces substrings for example:

replace_dict = {"p": p1}
str = "p=>p1"
str = replace(str, replace_dict)
print(str)  # outputs "p1=>p11", but the output should be p1=>p1

now... I'm trying to figure out how I can tackle this problem without making my regex too complicated.

Any suggestions?

Thanks

Do you mean that the string must be equal to the key (no other characters) for replacement or that the key must be a "word" on its own in the string to be replaced (where "word" may need further definition)? — Michael Butscher, Nov 22 '19 at 14:07
In your last example, it appears that you actually do want it to replace a substring, as the first `p` (before the `=>p1`), while a prefix, is still a substring of your `str = "p=>p1"`. Do you mean you only want the first full match to be replaced? Is `=>` a special separator? — Chris Clayton, Nov 22 '19 at 14:11

score 1 · Accepted Answer · answered Nov 22 '19 at 14:12

What you need is word boundaries which in regex are \b.

Here is a fixed code:

with_boundaries = map(lambda x: "\\b" + x + "\\b", rep.keys())
pattern = re.compile("|".join(sorted(with_boundaries, key=len, reverse=True)))
ret_string = pattern.sub(lambda m: rep[re.escape(m.group(0))], ret_string)

create a Regex that won't replace substrings

1 Answers1