0

I'm following the accepted answer in this link: Replace all words from word list with another string in python

Despite following the code exactly as describe in the above solution, I can't seem to remove the characters from my string. I am not receiving any errors in the console. Could anybody point out what I am doing wrong? Here is a reproducible example. Thank you.

example = "(-) This is an example of a string € which is + not being cleaned // correctly"
prohibited_strings = ["(-)","€","+","//"]
regex_cleaner = re.compile(r'\b%s\b' % r'\b|\b'.join(map(re.escape, prohibited_strings)))
example = regex_cleaner.sub("", example)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Ciaran O Brien
  • 374
  • 3
  • 13
  • `\b` meaning is context dependent, none of your strings are "words", they neither start nor end with word chars, thus why did you bother adding word boundaries to the pattern? – Wiktor Stribiżew May 26 '21 at 11:28
  • @WiktorStribiżew Hi Wiktor, this is a small section of my overall code. My real code includes hundreds of alphabetical words also. And my example string is thousands of words long. this is why I added the boundary. The solution above also uses `\b`. – Ciaran O Brien May 26 '21 at 11:30
  • Ok, I see. You cannot use `\b` in this case. See the [linked thread](https://stackoverflow.com/a/45145800/3832970), try unambiguous word boundaries. You might need a custom solution if those listed in that answer do not work fine for you. – Wiktor Stribiżew May 26 '21 at 11:32
  • I do not understand. How do I remove a list with a mix of 'words' and 'non words' such as `prohibited_strings = ["(-)","€","+","//", "fhd", thoias","opk"]` ? The link you gave does not answer my question unfortunately. I am working in Python and don't understand the content in that link. @WiktorStribiżew edit: I will try the link you gave above. Thank you. – Ciaran O Brien May 26 '21 at 11:34
  • 1
    Try `r'(?<!\w)(?:{})(?!\w)'.format( "|".join(map(re.escape,prohibited_strings)) )`, see solutions in [this thread](https://stackoverflow.com/questions/29996079/match-a-whole-word-in-a-string-using-dynamic-regex). – Wiktor Stribiżew May 26 '21 at 11:36
  • Fantastic. Your suggestion worked. I will use that code for non-words, and my original attempt for 'words'. @WiktorStribiżew – Ciaran O Brien May 26 '21 at 11:38

0 Answers0