1

I'm trying to create a new column in a dataframe that labels animals that are domesticated with a 1. I'm using a for loop, but for some reason, the loop only picks up the last item in the pets list. dog, cat, and gerbil should all be assigned a 1 under the domesticated column. Anyone have a fix for this or a better approach?

df = pd.DataFrame(
    {'creature': ['dog', 'cat', 'gerbil', 'mouse', 'donkey']
    })

pets = ['dog', 'cat', 'gerbil']

for pet in pets:
    df['domesticated'] = np.where(df['creature']==pet, 1, 0)

df
cs95
  • 379,657
  • 97
  • 704
  • 746
bbk611
  • 321
  • 2
  • 10

2 Answers2

4

You are setting all non gerbil to 0 in your last loop iteration. That is, when pet is gerbil in your last iteration, ALL entries that are not equal to gerbil will correspond to 0. This includes entries that are dog or cat. You should check all values in pets at once. Try this:

df['domesticated'] = df['creature'].apply(lambda x: 1 if x in pets else 0)

If you want to stick with np.where:

df['domesticated'] = np.where(df['creature'].isin(pets), 1, 0)
busybear
  • 10,194
  • 1
  • 25
  • 42
  • this is great. I'll check mark this when I'm allowed to. Do you know why the for loop doesn't work in this case? I include `dog`, `cat`, and `gerbil` in the `pets` list, so I thought iterating over that list would work fine. – bbk611 Mar 20 '19 at 23:24
  • I updated my post with more detail. The idea is the last iteration overwrites everything from before so you are essentially just checking for values that are equal to `gerbil`. – busybear Mar 20 '19 at 23:27
1

The problem is every loop resets your results.

df['domesticated'] = df.isin(pets).astype(int)

  creature  domesticated
0      dog             1
1      cat             1
2   gerbil             1
3    mouse             0
4   donkey             0
gold_cy
  • 13,648
  • 3
  • 23
  • 45