How to replace multiple values in a dataframe column in Python Pandas?

Question

import pandas as pd

df = pd.DataFrame({
    'name':['john','mary','paul','john','mary'],
    'num_children':[0,4,5,28,28],
    'num_pets':[0,1,2,28,28]
})

df.replace({'name':{'john':'works','mary':'works'}})

I want to do the equivalent of the above code, replace all values in the 'name' column that are not "paul" with "works". The example only has three possible values so it's not too bad, but is there an easier way to do it for a column with much more possible values?

Thanks in advance!

score 3 · Accepted Answer · edited Oct 30 '19 at 18:25

3

You can try with this:

df.loc[df['name'] != 'paul', 'name'] = 'works'

edited Oct 30 '19 at 18:25

ALollz

57,915
7
66
89

answered Oct 30 '19 at 18:21

baccandr

1,090
8
15

why is it better than series.where or Series.mask? – ansev Oct 30 '19 at 18:30
When I change it to df.loc[df['name'] !={'paul','john'}, 'name'] = 'works', the whole 'name' column changed to 'works'. Essentially I want to exclude both 'paul' and 'john' and change the rest to 'works'. Can you point out where I did wrong? – Garry W Oct 30 '19 at 18:32
1

@GarryW If you have a list of names then I would use `isin`as proposed by @marcocarranza. Something like: `df.loc[~df['name'].isin(["paul","john"])] = 'works'` – baccandr Oct 30 '19 at 18:39
@ansev I think the two solutions are substantially equivalent. – baccandr Oct 30 '19 at 18:43
In this case you need: `df.loc[~df['name'].isin(['paul','john']), 'name'] = 'works'`..see my solution – ansev Oct 30 '19 at 18:44
If both solutions are similar, then why not vote? @baccandr – ansev Oct 30 '19 at 18:46

score 3 · Answer 2 · answered Oct 30 '19 at 18:25

3

It's very simple :

list = ['john', 'mary']

df.loc[df['name'].isin(list),'name'] = 'works'  

In [1]: df
    name  num_children  num_pets
0  works             0         0
1  works             4         1
2   paul             5         2
3  works            28        28
4  works            28        28

answered Oct 30 '19 at 18:25

mcrrnz

545
6
10

I guess that there is better to use `~df['name'].isin(["paul"])`. – rpanai Oct 30 '19 at 18:32
1

or `df['name'].ne('paul')` – ansev Oct 30 '19 at 18:32
The code could be more compact, but it easier to understand if you are a newbie. – mcrrnz Oct 30 '19 at 18:34

ansev · Answer 3 · 2019-10-30T18:49:47.920

Use Series.where:

df['name']=df.name.where(df.name=='paul','works')
print(df)
    name  num_children  num_pets
0  works             0         0
1  works             4         1
2   paul             5         2
3  works            28        28
4  works            28        28

or Series.mask:

df['name']=df.name.mask(df.name!='paul','works')
print(df)
    name  num_children  num_pets
0  works             0         0
1  works             4         1
2   paul             5         2
3  works            28        28
4  works            28        28

for moren than one name

df['name']=df.name.where(df.name.isin(['paul','john']),'works')
print(df)
    name  num_children  num_pets
0   john             0         0
1  works             4         1
2   paul             5         2
3   john            28        28
4  works            28        28

or with loc:

df.loc[~df['name'].isin(['paul','john']), 'name'] = 'works'
print(df)

    name  num_children  num_pets
0   john             0         0
1  works             4         1
2   paul             5         2
3   john            28        28
4  works            28        28

so many different ways of doing same thing. which is faster? — mike01010, Mar 26 '23 at 19:08

How to replace multiple values in a dataframe column in Python Pandas?

3 Answers3