2

I have a long Pandas dataset that contains a column called 'id' and another column called 'species', among other columns. I have to perform a change on the 'species' column, based on specific values of the 'id' column.

For example, if the 'id' is '5555555' (as a string), then I want that the 'species' value change its current value 'dove' (also a string) to 'hummingbird'. So far I have been using the method:

df.loc[df["id"] == '5555555', "species"] = 'hummingbird'

Here is short sample data frame:

import pandas as pd
        
#Starting dataset
d = {'id': ['11111111', '22222222', '33333333', '44444444', '55555555', '66666666', '77777777', '88888888'], 'species': ['dove', 'dove', 'dove', 'hummingbird', 'hummingbird', 'dove', 'hummingbird', 'dove']}
df = pd.DataFrame(data=d)
df
    
    id          species
0   11111111    dove
1   22222222    dove        #wants to replace
2   33333333    dove        #wants to replace
3   44444444    hummingbird
4   55555555    hummingbird
5   66666666    dove
6   77777777    hummingbird
7   88888888    dove        #wants to replace        
     
#Expected outcome
d = {'id': ['11111111', '22222222', '33333333', '44444444', '55555555', '66666666', '77777777', '88888888'], 'species': ['dove', 'hummingbird', 'hummingbird', 'hummingbird', 'hummingbird', 'dove', 'hummingbird', 'hummingbird']}
df = pd.DataFrame(data=d)
df
    
    id          species
0   11111111    dove
1   22222222    hummingbird #replaced
2   33333333    hummingbird #replaced
3   44444444    hummingbird
4   55555555    hummingbird
5   66666666    dove
6   77777777    hummingbird
7   88888888    hummingbird #replaced

This is ok for a small number of lines, but I have to do this to about 1000 lines with individual 'id' each, so I thought that maybe a loop that I could feed it the list of 'id', but I honestly do not know how to even start.

Thanks in advance!!

and thanks to Scott Boston for pointing me out in the right direction to ask better questions!

lwebgru
  • 23
  • 4
  • kindly add sample dataset with expected output – sammywemmy Jul 07 '21 at 00:56
  • Will do, thank you for the advice! – lwebgru Jul 07 '21 at 01:16
  • Why does 2222 and 3333 change but not 1111? – Scott Boston Jul 07 '21 at 02:23
  • @ScottBoston because those are the ones that I might want to change. The assumption is that those specific```'id'```were named incorrectly on my dataset, therefore I want to change them, only that in real life I have about 1000 of them, so I wanted to know if there is a more efficient way than having to copy and paste the method that I mentioned in the original post. – lwebgru Jul 07 '21 at 15:49
  • 1
    Okay... for this question it is best if you create a small toy dataset and a list of id's you want to change and the values you want to change. You question here as stated is pretty broad hence you have no current answers. I suspect what you are trying to accomplish is pretty easy just that we are not sure what your inputs nor your expected output is. See this [post](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples?answertab=oldest#tab-top) to help use help you. – Scott Boston Jul 07 '21 at 16:17
  • 1
    @ScottBoston Thank you so much for pointing me in the right direction to ask better questions! I'm sure I'll get better with time, but I hope this new version of my question can clarify what I want to accomplish. – lwebgru Jul 07 '21 at 17:08

1 Answers1

1

Use isin

humming_ids = [44444444, 5555555, 88888888]
df.loc[df.id.isin(humming_ids), "species"] = 'hummingbird'
Vishnudev Krishnadas
  • 10,679
  • 2
  • 23
  • 55
  • Thank you so much @Vishnudev! This worked great. I also want to mention that I combined your answer with an answer from another post (https://stackoverflow.com/questions/41768196/python-convert-dataframe-into-a-list-with-string-items-inside-list) since I also needed to extract the id from an excel sheet and then convert them into a python list to use your answer. You have no idea how much this will help me! – lwebgru Jul 07 '21 at 18:32