Replace multiple values with a single value using Pandas

Question

I have a list of assorted values which I need to replace with a single value (Drive-by). I did my research but the closest post I could find is the below attached link which is not using Pandas. What is the most viable way to get this accomplished?

Python replace multiple strings

fourth = pd.read_csv('C:/infocentertracker.csv')
fourth = fourth.rename(columns={'Phone Number: ': 'Phone Number:'})
fourth['Source:'] = fourth['Source:'].replace('......', 'Drive-by')

fourth.to_csv(.............)

Drive By
Drive-By
Drive-by; Return Visitor
Drive/LTX.com/Internes Srch                  Replace all with Drive-by
Driving By/Lantana Website
Drive by
Driving By/Return Visitor
Drive by/Resident Referral
Driving by
Drive- by
Driving by/LTX Website
Driving By
Driving by/Return Visitor
Drive By/Return Visitor
Drive By/LTX Website

is it safe to assume that only target values start with "Driv"? — Marat, Feb 10 '17 at 14:29

score 2 · Answer 1 · edited May 23 '17 at 11:53

You can use boolean mask by str.startswith for replace all values starts with Driv and idea is from comment of Marat:

df.loc[df.col.str.startswith('Driv'), 'col'] = 'Drive-by'

Sample:

print (fourth)
                            col
0                      Drive By
1                      Drive-By
2      Drive-by; Return Visitor
3   Drive/LTX.com/Internes Srch
4    Driving By/Lantana Website
5                      Drive by
6     Driving By/Return Visitor
7    Drive by/Resident Referral
8                    Driving by
9                     Drive- by
10       Driving by/LTX Website
11                   Driving By
12    Driving by/Return Visitor
13      Drive By/Return Visitor
14         Drive By/LTX Website
15                          aaa

fourth.loc[fourth['Source:'].str.startswith('Driv'), 'Source:'] = 'Drive-by'
print (fourth)
     Source:
0   Drive-by
1   Drive-by
2   Drive-by
3   Drive-by
4   Drive-by
5   Drive-by
6   Drive-by
7   Drive-by
8   Drive-by
9   Drive-by
10  Drive-by
11  Drive-by
12  Drive-by
13  Drive-by
14  Drive-by
15       aaa

Another solution with Series.mask:

fourth['Source:']=fourth['Source:'].mask(fourth['Source:'].str.startswith('Driv', na=False),
                                       'Drive-by')
print (fourth)
     Source:
0   Drive-by
1   Drive-by
2   Drive-by
3   Drive-by
4   Drive-by
5   Drive-by
6   Drive-by
7   Drive-by
8   Drive-by
9   Drive-by
10  Drive-by
11  Drive-by
12  Drive-by
13  Drive-by
14  Drive-by
15       aaa

Thanks, sorry if this may sound stupid, I tried fourth.loc[fourth.col.str.startswith('Driv'), 'Source:'] = 'Drive-by' but it threw out an error.....'DataFrame' object has no attribute 'col' — Jake Wagner, Feb 10 '17 at 14:44
It is column nmae, I change it by your column name to `Source:` — jezrael, Feb 10 '17 at 14:46

A.Kot · Accepted Answer · 2017-02-10T14:45:13.493

1

One option is the following as you requested a pandas method:

fourth.ix[fourth['column name with values'].str.contains('driv', case=False, na=False), 'column name with values'] = 'Drive-by'

I would prefer to use regex which is not necessarily requiring pandas:

import re

[re.sub('(Driv.+)', 'Drive-by', i) for i in fourth['column name']]

edited Feb 10 '17 at 14:45

answered Feb 10 '17 at 14:31

A.Kot

7,615
2
22
24

Thanks, I get an error...ValueError: cannot index with vector containing NA / NaN values – Jake Wagner Feb 10 '17 at 14:40
@Pythoner I added an extra parameter in the str.contains which is `na=False`. All native pandas functions. Just wasn't sure what your data looked like – A.Kot Feb 10 '17 at 14:45

score 1 · Answer 3 · answered Mar 15 '19 at 13:40

You can replace multiple values (a list) with a single value in Pandas

govt_alias = ['govt', 'govern']
df['installer'].str.replace('|'.join(govt_alias), 'government')

In your specific case, the other answers are more optimal, but the method I've shown is generalize-able.

Replace multiple values with a single value using Pandas

3 Answers3