0

I have a sample data frame text column that contains strings including the word 'eng' and the word 'engine'.

ID  Text
1   eng is here
2   engine needs washing
3   eng is overheating 

I want to replace the word 'eng' with the word 'engine'. I use the code below:

df['Text'] = df['Text'].str.replace('eng', 'engine')

But this messes up my text in my second row. The second row becomes

ID  Text
2   engineine needs washing

Is there a way to do the word replace so that it only replaces when the entire word says 'eng' only?

cs95
  • 379,657
  • 97
  • 704
  • 746
PineNuts0
  • 4,740
  • 21
  • 67
  • 112

3 Answers3

10

Wrap your keyword with the word boundary character \b:

df['Text'].str.replace(r'\beng\b', 'engine')

0           engine is here
1     engine needs washing
2    engine is overheating
Name: Text, dtype: object

If you have multiple keywords to replace in this manner, pass a dictionary to replace with the regex=True switch:

repl = {'eng' : 'engine'}
repl = {rf'\b{k}\b': v for k, v in repl.items()}

df['Text'].replace(repl, regex=True)

0           engine is here
1     engine needs washing
2    engine is overheating
Name: Text, dtype: object
cs95
  • 379,657
  • 97
  • 704
  • 746
5

Adding a blank and fixed that problem from your own code

df['Text'].str.replace('eng ', 'engine ')
Out[736]: 
0            engine is here
1      engine needs washing
2    engine is overheating 
Name: Text, dtype: object

Update

df.Text.str.split(' ',expand=True).replace('eng','engine').fillna('').apply(' '.join,1)
Out[752]: 
0           engine is here 
1     engine needs washing 
2    engine is overheating 
dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234
1

You could try regular expressions like so:

import re
df['Text'] = df['Text'].map(lambda x: re.sub(r'\beng\b', 'engine', x))

The \b Tags in this given regular expression match "wordboundaries" so 'eng' will be forced to be surrounded by spaces for example.

Schorsch
  • 171
  • 9