Search and replace with "whole word only" option

Question

I have a script that runs into my text and search and replace all the sentences I write based in a database.

The script:

with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
    for l in f:
        s = l.split('*')
        editor.replace(s[0],s[1])

And the Database example:

Event*Evento*
result*resultado*

And so on...

Now what is happening is that I need the "whole word only" in that script, because I'm finding myself with problems.

For example with Result and Event, because when I replace for Resultado and Evento, and I run the script one more time in the text the script replace again the Resultado and Evento.

And the result after I run the script stays like this Resultadoado and Eventoo.

Just so you guys know.. Its not only for Event and Result, there is more then 1000+ sentences that I already set for the search and replace to work..

I don't need a simples search and replace for two words.. because I'm going to be editing the database over and over for different sentences..

score 28 · Answer 1 · answered Jul 18 '13 at 18:07

28

You want a regular expression. You can use the token \b to match a word boundary: i.e., \bresult\b would match only the exact word "result."

import re

with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
    for l in f:
        s = l.split('*')
        editor = re.sub(r"\b%s\b" % s[0] , s[1], editor)

answered Jul 18 '13 at 18:07

kindall

178,883
35
278
309

i should replace that in for my currently script? and in the database i should add the \b before each word ? – Renan Cidale Jul 18 '13 at 18:21
for example \bresult*\bresultado* ? – Renan Cidale Jul 18 '13 at 18:22
2

Just replace the code you have with this... the script adds the `\b`s so you don't have to have them in the "database". – kindall Jul 18 '13 at 18:23
so i replace the code you wrote for the one that was in my script.. then i save.. then i wrote "Result" in a 3rd tab and add the Result*Resultado* into my database.. then i ran the script .. but it didnt work out – Renan Cidale Jul 18 '13 at 18:25
1

@RenanCidale: Add `\b` before and after each word you want to match, but not at all to the replacement string. Make sure that you us raw strings (`r'a raw string'`) otherwise `'\b'` is interpreted as a backspace. – Steven Rumbalski Jul 18 '13 at 18:26
@kindall: Perhaps change `s[0]` to `re.escape(s[0])` just in case the source data contains characters that can be interpreted as regular expressions. But it could be overkill. – Steven Rumbalski Jul 18 '13 at 18:28
im feeling so stupid right now.. i can understand anything that you guys are trying to say.. – Renan Cidale Jul 18 '13 at 18:32
import re with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f: for l in f: s = l.split('*') editor = re.sub(r"\b%s\b" % s[0] , s[1], editor) i replace that code you said by my script.. then i tried to run the script into my text .. and it didnt work out – Renan Cidale Jul 18 '13 at 18:36
Please dont give up on me ! – Renan Cidale Jul 18 '13 at 18:43
@RenanCidale did you managed to get a solution for your dictionary problem? – Freedox Mar 01 '18 at 11:03

score 17 · Answer 2 · edited Oct 19 '16 at 16:00

Use re.sub:

replacements = {'the':'a', 
                'this':'that'}

def replace(match):
    return replacements[match.group(0)]

# notice that the 'this' in 'thistle' is not matched 
print re.sub('|'.join(r'\b%s\b' % re.escape(s) for s in replacements), 
        replace, 'the cat has this thistle.')

Prints

a cat has that thistle.

Notes:

All the strings to be replaced are joined into a single pattern so that the string needs to be looped over just once.
The source strings are passed to re.escape to make avoid interpreting them as regular expressions.
The words are surrounded by r'\b' to make sure matches are for whole words only.
A replacement function is used so that any match can be replaced.

score 13 · Answer 3 · answered Jul 18 '13 at 18:05

13

Use re.sub instead of normal string replace to replace only whole words.So your script,even if it runs again will not replace the already replaced words.

>>> import re
>>> editor = "This is result of the match"
>>> new_editor = re.sub(r"\bresult\b","resultado",editor)
>>> new_editor
'This is resultado of the match'
>>> newest_editor = re.sub(r"\bresult\b","resultado",new_editor)
>>> newest_editor
'This is resultado of the match'

answered Jul 18 '13 at 18:05

DhruvPathak

42,059
16
116
175

1

where do i replace that – Renan Cidale Jul 18 '13 at 18:17
just so you know.. the database contains more then 1400 words .. and Result and Event are just examples.. – Renan Cidale Jul 18 '13 at 18:18
To introduce a variable in replacement use: `re.sub(r"\b{}\b".format(variable),"resultado",editor)` where `variable = "result"`. – hafiz031 Oct 12 '21 at 10:57
this is cool for me – lobjc Nov 06 '21 at 13:42

score 6 · Answer 4 · answered Jan 04 '18 at 07:38

It is very simple. use re.sub, don't use replace.

import re
replacements = {r'\bthe\b':'a', 
                r'\bthis\b':'that'}

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = re.sub(i,j,text)
    return text

replace_all("the cat has this thistle.", replacements)

It will print

a cat has that thistle.

score 0 · Answer 5 · answered Aug 04 '18 at 20:35

import re

match = {}  # create a dictionary of words-to-replace and words-to-replace-with

f = open("filename", "r")
data = f.read()  # string of all file content


def replace_all(text, dic):
    for i, j in dic.items():
        text = re.sub(r"\b%s\b" % i, j, text)
        # r"\b%s\b"% enables replacing by whole word matches only
    return text


data = replace_all(data, match)
print(data)  # you can copy and paste the result to whatever file you like

Didn't work for me. It still replaces partial matches. Please check — vineeshvs, Dec 20 '18 at 07:45

Search and replace with "whole word only" option

5 Answers5

Linked

Related