Python replace function not replacing all words

Question

I have a list of words/phrases that I want to filter out of a string. However, my code doesn't filter out all the words. Why is this happening?

stop_words = ['nbsp']
string = 'applicant nbsp entrepreneur nbsp develop level nbsp export artist nbsp export entrepreneur nbsp record label nbsp nbsp music publisher nbsp nbsp music manager'

for word in stop_words:
    if word in string:
        string = re.sub(" {} ".format(word), " ", string)
print(string)

After running this code, this was the output.

'applicant entrepreneur develop level export artist export entrepreneur record label nbsp music publisher nbsp music manager'

As you can see, 'nbsp' is still in the string. Additionally, in my actual list of stop words, I have elements in the list that are more than one word long. For example, "sleeping in" is an element. I also did not omit the spaces on either side of word so that single-letter cases such as "a" would not be filtered out of words with "a" in them.

Because there are single spaces between words, so your regex can't match consecutive matches as the last `" "`consumes a space after a word making it unavailable for the next match. — Wiktor Stribiżew, Aug 09 '19 at 13:32
Also, `for word in stop_words:` will take too long, build a dynamic pattern matching the whole word using one of the approaches described in [my answer](https://stackoverflow.com/a/29996092/3832970). Looks like you need whitespace boundaries. — Wiktor Stribiżew, Aug 09 '19 at 13:34
As per [my post](https://stackoverflow.com/a/29996092/3832970), `string = re.sub(r'(?<!\S)(?:{})(?!\S)'.format("|".join(map(re.escape, stop_words))), '', string)` — Wiktor Stribiżew, Aug 09 '19 at 13:36

Python replace function not replacing all words

0 Answers0