28

I have a script that runs into my text and search and replace all the sentences I write based in a database.

The script:

with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
    for l in f:
        s = l.split('*')
        editor.replace(s[0],s[1])

And the Database example:

Event*Evento*
result*resultado*

And so on...

Now what is happening is that I need the "whole word only" in that script, because I'm finding myself with problems.

For example with Result and Event, because when I replace for Resultado and Evento, and I run the script one more time in the text the script replace again the Resultado and Evento.

And the result after I run the script stays like this Resultadoado and Eventoo.

Just so you guys know.. Its not only for Event and Result, there is more then 1000+ sentences that I already set for the search and replace to work..

I don't need a simples search and replace for two words.. because I'm going to be editing the database over and over for different sentences..

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
Renan Cidale
  • 842
  • 2
  • 10
  • 23

5 Answers5

28

You want a regular expression. You can use the token \b to match a word boundary: i.e., \bresult\b would match only the exact word "result."

import re

with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
    for l in f:
        s = l.split('*')
        editor = re.sub(r"\b%s\b" % s[0] , s[1], editor)
kindall
  • 178,883
  • 35
  • 278
  • 309
  • i should replace that in for my currently script? and in the database i should add the \b before each word ? – Renan Cidale Jul 18 '13 at 18:21
  • for example \bresult*\bresultado* ? – Renan Cidale Jul 18 '13 at 18:22
  • 2
    Just replace the code you have with this... the script adds the `\b`s so you don't have to have them in the "database". – kindall Jul 18 '13 at 18:23
  • so i replace the code you wrote for the one that was in my script.. then i save.. then i wrote "Result" in a 3rd tab and add the Result*Resultado* into my database.. then i ran the script .. but it didnt work out – Renan Cidale Jul 18 '13 at 18:25
  • 1
    @RenanCidale: Add `\b` before and after each word you want to match, but not at all to the replacement string. Make sure that you us raw strings (`r'a raw string'`) otherwise `'\b'` is interpreted as a backspace. – Steven Rumbalski Jul 18 '13 at 18:26
  • @kindall: Perhaps change `s[0]` to `re.escape(s[0])` just in case the source data contains characters that can be interpreted as regular expressions. But it could be overkill. – Steven Rumbalski Jul 18 '13 at 18:28
  • im feeling so stupid right now.. i can understand anything that you guys are trying to say.. – Renan Cidale Jul 18 '13 at 18:32
  • import re with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f: for l in f: s = l.split('*') editor = re.sub(r"\b%s\b" % s[0] , s[1], editor) i replace that code you said by my script.. then i tried to run the script into my text .. and it didnt work out – Renan Cidale Jul 18 '13 at 18:36
  • Please dont give up on me ! – Renan Cidale Jul 18 '13 at 18:43
  • @RenanCidale did you managed to get a solution for your dictionary problem? – Freedox Mar 01 '18 at 11:03
17

Use re.sub:

replacements = {'the':'a', 
                'this':'that'}

def replace(match):
    return replacements[match.group(0)]

# notice that the 'this' in 'thistle' is not matched 
print re.sub('|'.join(r'\b%s\b' % re.escape(s) for s in replacements), 
        replace, 'the cat has this thistle.') 

Prints

a cat has that thistle.

Notes:

  • All the strings to be replaced are joined into a single pattern so that the string needs to be looped over just once.

  • The source strings are passed to re.escape to make avoid interpreting them as regular expressions.

  • The words are surrounded by r'\b' to make sure matches are for whole words only.

  • A replacement function is used so that any match can be replaced.

Samie Bencherif
  • 1,285
  • 12
  • 27
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
13

Use re.sub instead of normal string replace to replace only whole words.So your script,even if it runs again will not replace the already replaced words.

>>> import re
>>> editor = "This is result of the match"
>>> new_editor = re.sub(r"\bresult\b","resultado",editor)
>>> new_editor
'This is resultado of the match'
>>> newest_editor = re.sub(r"\bresult\b","resultado",new_editor)
>>> newest_editor
'This is resultado of the match'
DhruvPathak
  • 42,059
  • 16
  • 116
  • 175
6

It is very simple. use re.sub, don't use replace.

import re
replacements = {r'\bthe\b':'a', 
                r'\bthis\b':'that'}

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = re.sub(i,j,text)
    return text

replace_all("the cat has this thistle.", replacements)

It will print

a cat has that thistle.
Sudharsan
  • 850
  • 7
  • 6
0
import re

match = {}  # create a dictionary of words-to-replace and words-to-replace-with

f = open("filename", "r")
data = f.read()  # string of all file content


def replace_all(text, dic):
    for i, j in dic.items():
        text = re.sub(r"\b%s\b" % i, j, text)
        # r"\b%s\b"% enables replacing by whole word matches only
    return text


data = replace_all(data, match)
print(data)  # you can copy and paste the result to whatever file you like
Chris Zhu
  • 29
  • 1