Perform full-word substring replacement with pandas str.replace

Question

I have a sample data frame text column that contains strings including the word 'eng' and the word 'engine'.

ID  Text
1   eng is here
2   engine needs washing
3   eng is overheating

I want to replace the word 'eng' with the word 'engine'. I use the code below:

df['Text'] = df['Text'].str.replace('eng', 'engine')

But this messes up my text in my second row. The second row becomes

ID  Text
2   engineine needs washing

Is there a way to do the word replace so that it only replaces when the entire word says 'eng' only?

cs95 · Answer 1 · 2019-01-02T15:52:09.253

10

Wrap your keyword with the word boundary character \b:

df['Text'].str.replace(r'\beng\b', 'engine')

0           engine is here
1     engine needs washing
2    engine is overheating
Name: Text, dtype: object

If you have multiple keywords to replace in this manner, pass a dictionary to replace with the regex=True switch:

repl = {'eng' : 'engine'}
repl = {rf'\b{k}\b': v for k, v in repl.items()}

df['Text'].replace(repl, regex=True)

0           engine is here
1     engine needs washing
2    engine is overheating
Name: Text, dtype: object

edited Jan 02 '19 at 15:52

answered Jan 02 '19 at 15:50

cs95

379,657
97
704
746

1

Nice picture ! :-) – BENY Jan 02 '19 at 15:51
@W-B, which one :-) coldspeed's pic ? – Karn Kumar Jan 02 '19 at 15:51
1

@W-B Thank you! New year, new profile :) – cs95 Jan 02 '19 at 15:52
1

Good Pic with New Year Change! Bingo! Happy New Year. – Karn Kumar Jan 02 '19 at 15:54
1

@pygo tyvm, happy new year! – cs95 Jan 02 '19 at 15:58

BENY · Accepted Answer · 2019-01-02T16:01:57.527

5

Adding a blank and fixed that problem from your own code

df['Text'].str.replace('eng ', 'engine ')
Out[736]: 
0            engine is here
1      engine needs washing
2    engine is overheating 
Name: Text, dtype: object

Update

df.Text.str.split(' ',expand=True).replace('eng','engine').fillna('').apply(' '.join,1)
Out[752]: 
0           engine is here 
1     engine needs washing 
2    engine is overheating 
dtype: object

edited Jan 02 '19 at 16:01

answered Jan 02 '19 at 15:53

BENY

317,841
20
164
234

1

Hmm, this will still match '`blaheng `', when it should match just `'eng'`. – cs95 Jan 02 '19 at 15:54
This must include word boundaries to match exact string. – Karn Kumar Jan 02 '19 at 15:56
and fail to match cases where the string ends with eng. its not a bad solution by any means if you know the edge cases, but it is not a very robust one. – Paritosh Singh Jan 02 '19 at 15:56
1

@coldspeed yep you are right , and update ...I thought I should delete , but since many people already saw my fault I would like to correct it . – BENY Jan 02 '19 at 16:02
1

@pygo check the update – BENY Jan 02 '19 at 16:02
@ParitoshSingh check the update and thank you for point it out – BENY Jan 02 '19 at 16:03
@W-B, thnx , Happy New year ! – Karn Kumar Jan 02 '19 at 16:04
@pygo you too :> happy new year – BENY Jan 02 '19 at 16:15

score 1 · Answer 3 · answered Jan 02 '19 at 15:51

You could try regular expressions like so:

import re
df['Text'] = df['Text'].map(lambda x: re.sub(r'\beng\b', 'engine', x))

The \b Tags in this given regular expression match "wordboundaries" so 'eng' will be forced to be surrounded by spaces for example.

Perform full-word substring replacement with pandas str.replace

3 Answers3