10

I have a user entered string and I want to search it and replace any occurrences of a list of words with my replacement string.

import re

prohibitedWords = ["MVGame","Kappa","DatSheffy","DansGame","BrainSlug","SwiftRage","Kreygasm","ArsonNoSexy","GingerPower","Poooound","TooSpicy"]


# word[1] contains the user entered message
themessage = str(word[1])    
# would like to implement a foreach loop here but not sure how to do it in python
for themessage in prohibitedwords:
    themessage =  re.sub(prohibitedWords, "(I'm an idiot)", themessage)

print themessage

The above code doesn't work, I'm sure I don't understand how python for loops work.

Zac
  • 2,229
  • 9
  • 33
  • 41

4 Answers4

38

You can do that with a single call to sub:

big_regex = re.compile('|'.join(map(re.escape, prohibitedWords)))
the_message = big_regex.sub("repl-string", str(word[1]))

Example:

>>> import re
>>> prohibitedWords = ['Some', 'Random', 'Words']
>>> big_regex = re.compile('|'.join(map(re.escape, prohibitedWords)))
>>> the_message = big_regex.sub("<replaced>", 'this message contains Some really Random Words')
>>> the_message
'this message contains <replaced> really <replaced> <replaced>'

Note that using str.replace may lead to subtle bugs:

>>> words = ['random', 'words']
>>> text = 'a sample message with random words'
>>> for word in words:
...     text = text.replace(word, 'swords')
... 
>>> text
'a sample message with sswords swords'

while using re.sub gives the correct result:

>>> big_regex = re.compile('|'.join(map(re.escape, words)))
>>> big_regex.sub("swords", 'a sample message with random words')
'a sample message with swords swords'

As thg435 points out, if you want to replace words and not every substring you can add the word boundaries to the regex:

big_regex = re.compile(r'\b%s\b' % r'\b|\b'.join(map(re.escape, words)))

this would replace 'random' in 'random words' but not in 'pseudorandom words'.

Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • You'd have to break it up if you had lots of words to replace, though. – DSM Mar 27 '13 at 12:15
  • You might want to enclose your expression in `\b`'s to avoid replacing "tail" in "retailers". – georg Mar 27 '13 at 12:31
  • I get a weird repeated string when I use this (the entire line prints twice) – Zac Mar 27 '13 at 12:32
  • @Zac It works well for me. Can you edit your answer and show what are you doing and the output you obtain? – Bakuriu Mar 27 '13 at 12:38
  • It works after a restart, there was some weird code going on after I'd done a previous syntax error and python had crashed. But after the restart the code works great. Thx. – Zac Mar 27 '13 at 14:37
  • this does not work for words, it works for characters, i.e. if you add 'e' to prohibitedWords, the result is `'this mssag contains rally ` – sergiuz Nov 02 '17 at 15:01
  • @sergiuz I don't get what you mean with your comment. There is no difference between words and letters except that generally words contain more than one letter. If you add a one-letter word to the list of words, sure it will be replaced in the same way as other words. If you have different requirements ask a new questions clarifying clearly what's different from this question. – Bakuriu Nov 02 '17 at 15:13
  • You are right, but in my opinion this is what the OP asked, to replace words not chars, this is how I endup here. Tx for your answer! – sergiuz Nov 03 '17 at 07:55
  • @sergiuz I want to point out two things: 1) did you read the last few lines of my answer? It shows how to not replace "words" inside substrings 2) The OP accepted my answer ans you can read his own comments right above yours, so your claims are false. – Bakuriu Nov 03 '17 at 10:36
6

try this:

prohibitedWords = ["MVGame","Kappa","DatSheffy","DansGame","BrainSlug","SwiftRage","Kreygasm","ArsonNoSexy","GingerPower","Poooound","TooSpicy"]

themessage = str(word[1])    
for word in prohibitedwords:
    themessage =  themessage.replace(word, "(I'm an idiot)")

print themessage
Artsiom Rudzenka
  • 27,895
  • 4
  • 34
  • 52
  • This is brittle: as Bakuriu explained, it easily breaks when one of the prohibited words is a substring of another. – Adam Mar 27 '13 at 12:19
  • 1
    @codesparkle it doesn't mean that is wrong, you always choose your option depends on certain conditions – Artsiom Rudzenka Mar 27 '13 at 12:25
1

Based on Bakariu's answer,

A simpler way to use re.sub would be like this.

words = ['random', 'words']
text = 'a sample message with random words'

new_sentence = re.sub("random|words", "swords", text)

The output is "a sample message with swords swords"

foodog123
  • 21
  • 2
0

Code:

prohibitedWords =["MVGame","Kappa","DatSheffy","DansGame",
                  "BrainSlug","SwiftRage","Kreygasm",
                  "ArsonNoSexy","GingerPower","Poooound","TooSpicy"]
themessage = 'Brain'   
self_criticism = '(I`m an idiot)'
final_message = [i.replace(themessage, self_criticism) for i in prohibitedWords]
print final_message

Result:

['MVGame', 'Kappa', 'DatSheffy', 'DansGame', '(I`m an idiot)Slug', 'SwiftRage',
'Kreygasm', 'ArsonNoSexy', 'GingerPower', 'Poooound','TooSpicy']
Vladimir Chub
  • 461
  • 6
  • 19