1
import pandas as pd

df = pd.DataFrame({
    'name':['john','mary','paul','john','mary'],
    'num_children':[0,4,5,28,28],
    'num_pets':[0,1,2,28,28]
})

df.replace({'name':{'john':'works','mary':'works'}})

I want to do the equivalent of the above code, replace all values in the 'name' column that are not "paul" with "works". The example only has three possible values so it's not too bad, but is there an easier way to do it for a column with much more possible values?

Thanks in advance!

Vishnudev Krishnadas
  • 10,679
  • 2
  • 23
  • 55
Garry W
  • 303
  • 2
  • 10

3 Answers3

3

You can try with this:

df.loc[df['name'] != 'paul', 'name'] = 'works'
ALollz
  • 57,915
  • 7
  • 66
  • 89
baccandr
  • 1,090
  • 8
  • 15
  • why is it better than series.where or Series.mask? – ansev Oct 30 '19 at 18:30
  • When I change it to df.loc[df['name'] !={'paul','john'}, 'name'] = 'works', the whole 'name' column changed to 'works'. Essentially I want to exclude both 'paul' and 'john' and change the rest to 'works'. Can you point out where I did wrong? – Garry W Oct 30 '19 at 18:32
  • 1
    @GarryW If you have a list of names then I would use `isin`as proposed by @marcocarranza. Something like: `df.loc[~df['name'].isin(["paul","john"])] = 'works'` – baccandr Oct 30 '19 at 18:39
  • @ansev I think the two solutions are substantially equivalent. – baccandr Oct 30 '19 at 18:43
  • In this case you need: `df.loc[~df['name'].isin(['paul','john']), 'name'] = 'works'`..see my solution – ansev Oct 30 '19 at 18:44
  • If both solutions are similar, then why not vote? @baccandr – ansev Oct 30 '19 at 18:46
3

It's very simple :

list = ['john', 'mary']

df.loc[df['name'].isin(list),'name'] = 'works'  

In [1]: df
    name  num_children  num_pets
0  works             0         0
1  works             4         1
2   paul             5         2
3  works            28        28
4  works            28        28
mcrrnz
  • 545
  • 6
  • 10
1

Use Series.where:

df['name']=df.name.where(df.name=='paul','works')
print(df)
    name  num_children  num_pets
0  works             0         0
1  works             4         1
2   paul             5         2
3  works            28        28
4  works            28        28

or Series.mask:

df['name']=df.name.mask(df.name!='paul','works')
print(df)
    name  num_children  num_pets
0  works             0         0
1  works             4         1
2   paul             5         2
3  works            28        28
4  works            28        28

for moren than one name

df['name']=df.name.where(df.name.isin(['paul','john']),'works')
print(df)
    name  num_children  num_pets
0   john             0         0
1  works             4         1
2   paul             5         2
3   john            28        28
4  works            28        28

or with loc:

df.loc[~df['name'].isin(['paul','john']), 'name'] = 'works'
print(df)

    name  num_children  num_pets
0   john             0         0
1  works             4         1
2   paul             5         2
3   john            28        28
4  works            28        28
ansev
  • 30,322
  • 5
  • 17
  • 31