Replace whole string if it contains substring in pandas

Question

I want to replace all strings that contain a specific substring. So for example if I have this dataframe:

import pandas as pd
df = pd.DataFrame({'name': ['Bob', 'Jane', 'Alice'], 
                   'sport': ['tennis', 'football', 'basketball']})

I could replace football with the string 'ball sport' like this:

df.replace({'sport': {'football': 'ball sport'}})

What I want though is to replace everything that contains ball (in this case football and basketball) with 'ball sport'. Something like this:

df.replace({'sport': {'[strings that contain ball]': 'ball sport'}})

EdChum · Accepted Answer · 2016-09-29T12:05:40.963

91

You can use str.contains to mask the rows that contain 'ball' and then overwrite with the new value:

In [71]:
df.loc[df['sport'].str.contains('ball'), 'sport'] = 'ball sport'
df

Out[71]:
    name       sport
0    Bob      tennis
1   Jane  ball sport
2  Alice  ball sport

To make it case-insensitive pass `case=False:

df.loc[df['sport'].str.contains('ball', case=False), 'sport'] = 'ball sport'

edited Sep 29 '16 at 12:05

answered Sep 29 '16 at 11:06

EdChum

376,765
198
813
562

2

`.contains` also accepts regex, so you could add the case-insensitive flag to the string instead of passing `case=False`, like: `.str.contains(r'(?i)ball')`. – Julio Cezar Silva Aug 06 '20 at 18:49

score 23 · Answer 2 · answered Sep 29 '16 at 11:07

23

You can use apply with a lambda. The x parameter of the lambda function will be each value in the 'sport' column:

df.sport = df.sport.apply(lambda x: 'ball sport' if 'ball' in x else x)

answered Sep 29 '16 at 11:07

DeepSpace

78,697
11
109
154

score 15 · Answer 3 · answered Sep 29 '16 at 11:10

15

you can use str.replace

df.sport.str.replace(r'(^.*ball.*$)', 'ball sport')

0        tennis
1    ball sport
2    ball sport
Name: sport, dtype: object

reassign with

df['sport'] = df.sport.str.replace(r'(^.*ball.*$)', 'ball sport')
df

answered Sep 29 '16 at 11:10

piRSquared

285,575
57
475
624

score 3 · Answer 4 · answered Feb 09 '18 at 03:26

3

A different str.contains

 df['support'][df.name.str.contains('ball')] = 'ball support'

answered Feb 09 '18 at 03:26

Axis

2,066
2
21
40

score 0 · Answer 5 · answered Feb 01 '22 at 15:26

0

You can use a lambda function also:

data  = {"number": [1, 2, 3, 4, 5], "function": ['IT', 'IT application', 
'IT digital', 'other', 'Digital'] }
df = pd.DataFrame(data)  
df.function = df.function.apply(lambda x: 'IT' if 'IT' in x else x)

answered Feb 01 '22 at 15:26

prashangrg

1
5

Replace whole string if it contains substring in pandas

5 Answers5

Linked

Related