0

I am trying to replace a string within a text file so that it only replaces the exact match of the string.

So if the file was:

"word"
"word_1"
"word1"
"wordA"

I want to replace just word with test_1 I would do this:

text.replace("word", "test_1")

But by doing this, I am also replacing every other word with other text after it with that string. So the file would be like this:

"test_1"
"test_1_1"
"test_11"
"test_1A"

I want it so I can only replace word so that it only replaces it if there are not any lowercase letters, uppercase letters or underscores after it. I want the replace to be restricted to only exact matches so if there are other strings in the file with the same text with letters, numbers or underscores after it, they will not be affected. I also want to so if any other characters are after it that are not letters, numbers or underscores will be replaced like if it was:

word"
word;
word:

Those would be fine because they do not have letters, numbers or underscores after them.

I want this so I can replace the other strings with different things like:

text.replace("word", "test_1")
text.replace("word_1", "test_2")
text.replace("word1", "test_3")
text.replace("wordA", "test_4")

So that the file will be:

test_1
test_2
test_3
test_4

How would I do this?

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
nmap256
  • 1
  • 1
  • 4
    This sounds like a job for regular expressions! – Max Aug 16 '16 at 20:24
  • 1
    `re.sub(r'\bword\b', 'test_1', text)`? See http://ideone.com/UNEywY – Wiktor Stribiżew Aug 16 '16 at 20:25
  • If anyone adds a regex tag, I will close with http://stackoverflow.com/questions/15863066/python-regular-expression-match-whole-word – Wiktor Stribiżew Aug 16 '16 at 20:28
  • 1
    @WiktorStribiżew done. – juanpa.arrivillaga Aug 16 '16 at 20:30
  • @WiktorStribiżew that is not exact dup because the OP ask to exclude `;`,`:` etc – Ohad Eytan Aug 16 '16 at 20:32
  • Was just writing answer but if you need a regex statement that works for your specs try: `re.sub(r'word(\d|\n|[a-zA-z]|_)+', 'test_' + str(accumulator), text)`. Accumulator is a number as you need to put at the end of `test_` – Mike Aug 16 '16 at 20:34
  • @OhadEytan: *I want it so I can only replace word so that it only replaces it if there are not any lowercase letters, uppercase letters or underscores after it.*. That is exactly what `\b` word boundary does. And `word"` will become `test"` if we use `re.sub(r'\bword\b', 'test', text)`. – Wiktor Stribiżew Aug 16 '16 at 20:41
  • @Mike: Your regex is inappropriate, there is one serious issue with it, and two performance related issues. – Wiktor Stribiżew Aug 16 '16 at 20:42
  • @Wiktor can see one performance issue with the str() call but what else do you see? – Mike Aug 16 '16 at 20:44
  • *Alternations* of *single* character matching patterns inside a *capturing group* that is *`+`-quantified*. That might add to readability, but a non-capturing group is what you mean. If you use a character class, this will get fixed by itself. – Wiktor Stribiżew Aug 16 '16 at 20:47
  • @WiktorStribiżew my mistake :( – Ohad Eytan Aug 16 '16 at 20:49

1 Answers1

0

Try this:

data = """word
word_1
word1
wordA
"""

replacements = {
    "word": "test_1",
    "word_1": "test_2",
    "word1": "test_3",
    "wordA": "test_4"
}

print(reduce(lambda a, b: a.replace(*b), replacements.iteritems(), data))
BPL
  • 9,632
  • 9
  • 59
  • 117