2

I have a list of assorted values which I need to replace with a single value (Drive-by). I did my research but the closest post I could find is the below attached link which is not using Pandas. What is the most viable way to get this accomplished?

Python replace multiple strings

fourth = pd.read_csv('C:/infocentertracker.csv')
fourth = fourth.rename(columns={'Phone Number: ': 'Phone Number:'})
fourth['Source:'] = fourth['Source:'].replace('......', 'Drive-by')

fourth.to_csv(.............)

Drive By
Drive-By
Drive-by; Return Visitor
Drive/LTX.com/Internes Srch                  Replace all with Drive-by
Driving By/Lantana Website
Drive by
Driving By/Return Visitor
Drive by/Resident Referral
Driving by
Drive- by
Driving by/LTX Website
Driving By
Driving by/Return Visitor
Drive By/Return Visitor
Drive By/LTX Website
Community
  • 1
  • 1
Jake Wagner
  • 786
  • 2
  • 12
  • 29

3 Answers3

2

You can use boolean mask by str.startswith for replace all values starts with Driv and idea is from comment of Marat:

df.loc[df.col.str.startswith('Driv'), 'col'] = 'Drive-by'

Sample:

print (fourth)
                            col
0                      Drive By
1                      Drive-By
2      Drive-by; Return Visitor
3   Drive/LTX.com/Internes Srch
4    Driving By/Lantana Website
5                      Drive by
6     Driving By/Return Visitor
7    Drive by/Resident Referral
8                    Driving by
9                     Drive- by
10       Driving by/LTX Website
11                   Driving By
12    Driving by/Return Visitor
13      Drive By/Return Visitor
14         Drive By/LTX Website
15                          aaa
fourth.loc[fourth['Source:'].str.startswith('Driv'), 'Source:'] = 'Drive-by'
print (fourth)
     Source:
0   Drive-by
1   Drive-by
2   Drive-by
3   Drive-by
4   Drive-by
5   Drive-by
6   Drive-by
7   Drive-by
8   Drive-by
9   Drive-by
10  Drive-by
11  Drive-by
12  Drive-by
13  Drive-by
14  Drive-by
15       aaa

Another solution with Series.mask:

fourth['Source:']=fourth['Source:'].mask(fourth['Source:'].str.startswith('Driv', na=False),
                                       'Drive-by')
print (fourth)
     Source:
0   Drive-by
1   Drive-by
2   Drive-by
3   Drive-by
4   Drive-by
5   Drive-by
6   Drive-by
7   Drive-by
8   Drive-by
9   Drive-by
10  Drive-by
11  Drive-by
12  Drive-by
13  Drive-by
14  Drive-by
15       aaa
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, sorry if this may sound stupid, I tried fourth.loc[fourth.col.str.startswith('Driv'), 'Source:'] = 'Drive-by' but it threw out an error.....'DataFrame' object has no attribute 'col' – Jake Wagner Feb 10 '17 at 14:44
  • It is column nmae, I change it by your column name to `Source:` – jezrael Feb 10 '17 at 14:46
1

One option is the following as you requested a pandas method:

fourth.ix[fourth['column name with values'].str.contains('driv', case=False, na=False), 'column name with values'] = 'Drive-by'

I would prefer to use regex which is not necessarily requiring pandas:

import re

[re.sub('(Driv.+)', 'Drive-by', i) for i in fourth['column name']]
A.Kot
  • 7,615
  • 2
  • 22
  • 24
  • Thanks, I get an error...ValueError: cannot index with vector containing NA / NaN values – Jake Wagner Feb 10 '17 at 14:40
  • @Pythoner I added an extra parameter in the str.contains which is `na=False`. All native pandas functions. Just wasn't sure what your data looked like – A.Kot Feb 10 '17 at 14:45
1

You can replace multiple values (a list) with a single value in Pandas

govt_alias = ['govt', 'govern']
df['installer'].str.replace('|'.join(govt_alias), 'government')

In your specific case, the other answers are more optimal, but the method I've shown is generalize-able.

clued__init__
  • 181
  • 2
  • 5