2

For a data frame I replaced a set of items in a column with a range of values as follows:

df['borough_num'] = df['Borough'].replace(regex=['MANHATTAN', 'BROOKLYN', 'QUEENS', 'STATEN ISLAND','BRONX'], value=[1, 2, 3, 4,5])

The issue that I want to replace all the rest of elements in 'Borough' that not mentioned before with the value 0 also I need to use regex because there are looks like data eg. 07 BRONX, I need it also to be replaced by 5 not 0

2 Answers2

1

From your previous question , using replace , about why it work , you can check link

s=df.Borough.replace(dict(zip(l,[1,2,3,4,5])),regex=True)
pd.to_numeric(s,errors = 'coerce').fillna(0).astype(int)
Out[44]: 
0    3
1    5 # notice here still change to 5 
2    1
3    2
4    0
Name: Borough, dtype: int32

Data Input

df = pd.DataFrame({
    'Borough': ['QUEENS', 'BRONX 777', 'MANHATTAN', 'BROOKLYN', 'INVALID']})
l = ['MANHATTAN', 'BROOKLYN', 'QUEENS', 'STATEN ISLAND','BRONX']
BENY
  • 317,841
  • 20
  • 164
  • 234
0

Or even shorter, use map:

df['borough_num']=df['Borough'].map(dict(zip(['MANHATTAN', 'BROOKLYN', 'QUEENS', 'STATEN ISLAND','BRONX'],[1, 2, 3, 4,5])))

And now:

print(df)

Is as expected.

Update:

df['borough_num']=df['Borough'].str.replace('\d+','').map(dict(zip(['MANHATTAN', 'BROOKLYN', 'QUEENS', 'STATEN ISLAND','BRONX'],[1, 2, 3, 4,5])))
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
  • 1
    looks like elements such as 07 Bronx shall be replaced by 5 not zero – Mostafa Qasim Dec 12 '18 at 04:02
  • @MostafaQasim please accept and up-vote answers when they work, like this one :-), https://stackoverflow.com/help/someone-answers – U13-Forward Dec 12 '18 at 04:07
  • I got the real problem @U9-Forward, the modified data has a space before the item , the original element is "BRONX" while the modified by <> is stored as " BRONX" note the space before the word, for that reason its not replaced by 5 as its supposed, also we need to add .fillna(0).astype(int) from W-B code so irrelevant elements is replaced by 0 not NAN – Mostafa Qasim Dec 12 '18 at 17:16
  • Also @U9-Forward there are some elements such as Unspecified MANHATTAN & Unspecified STATEN ISLAND didn't modified by <>, If we can split the element name by the first space & take all after the space it may work for all ements – Mostafa Qasim Dec 12 '18 at 17:31