0

My question is, given a sentence, and a token to be replaced in the sentence, replace the token with a special symbol. To give you an example:

import re

sentence = "Gorjan uses StackOverflow to ask for help."
token = "StackOverflow"
# "Gorjan uses <REPLACED> to ask for help."
replaced = re.sub(rf"(?<!\w)({token})(?!\w)", "<REPLACED>", sentence, count=1)

However, what I'm unable to do, is do the replacement when the token contains a mix of alphanumeric and special characters. For example:

sentence = "Gorjan uses StackOverflow to ask for help about C++ often."
token = "C++"
# This fails with an error: multiple repeat at position 10
replaced = re.sub(rf"(?<!\w)({token})(?!\w)", "<REPLACED>", sentence, count=1)

Now, since I'm not really good with regex in Python, I would appreciate if someone could break down for me what is going on.

Thanks!

gorjan
  • 5,405
  • 2
  • 20
  • 40
  • 1
    Use adaptive word boundaries, and make sure you escape the `token` with `re.escape`, so use `rf"(?!\B\w)({re.escape(token)})(?<!\w\B)"` or variations. – Wiktor Stribiżew Nov 14 '22 at 08:48

0 Answers0