3

I would like to drop all the row which are not in a list in pandas DataFrame

For instance, consider this dataframe :

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
    'year': [2012, 2012, 2013, 2014, 2014], 
    'reports': [4, 24, 31, 2, 3]}
df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa         Cruz', 'Maricopa', 'Yuma'])
df

To drop a row by name it's easy :

df = df[df.name != 'Tina'] # to drop the row which include Tina in the name column

But if I want to keep only the row Jason and Molly :

List=['Jason', 'Molly']
df = df[df.name not in List]

doesn't work !

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Coolpix
  • 503
  • 1
  • 6
  • 20

3 Answers3

6

Use isin and pass the list as the arg and invert the condition using ~:

In [58]:
names = ['Jason', 'Molly']
df[~df['name'].isin(names)]

Out[58]:
                    name  reports  year
Santa         Cruz  Tina       31  2013
Maricopa            Jake        2  2014
Yuma                 Amy        3  2014
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • The new SO ordering of answers by time kinda removes the ambiguity I guess – EdChum Jun 03 '16 at 08:43
  • @jezrael it's already been implemented for a few weeks if you look any old answers, you will see that the accepted answer is not necessarily the first ordered answer or that the most highly voted answer is ordered first, all answers are now ordered by time – EdChum Jun 03 '16 at 08:46
  • Hi. Could you explain what "~" mean plz ? – Coolpix Jun 03 '16 at 11:42
  • `~` negates or inverts the boolean array it's equivalent to `not` – EdChum Jun 03 '16 at 11:58
2

Use isin with inverting boolean mask by ~. I think better is use other name of list as List, e.g. l or names as use EdChum:

List=['Jason', 'Molly']
df = df[~df.name.isin(List)]

l=['Jason', 'Molly']
df = df[~df.name.isin(l)]

print (df)
                    name  reports  year
Santa         Cruz  Tina       31  2013
Maricopa            Jake        2  2014
Yuma                 Amy        3  2014

Explanation:

You can use mask with inverting boolean with ~ with boolean indexing:

print (df['name'].isin(l))
Cochice                True
Pima                   True
Santa         Cruz    False
Maricopa              False
Yuma                  False
Name: name, dtype: bool

print (~df['name'].isin(l))
Cochice               False
Pima                  False
Santa         Cruz     True
Maricopa               True
Yuma                   True
Name: name, dtype: bool
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I try explain more `~`, please see update. Also you can check this [post](http://stackoverflow.com/questions/15998188/how-can-i-obtain-the-element-wise-logical-not-of-a-pandas-series). – jezrael Jun 03 '16 at 11:51
  • Thanks I will check your post :) – Coolpix Jun 03 '16 at 11:57
  • Btw, It is interesting phenomen in SO - I think our answer is almost equal, but one is (highly) upvote and another not. Hmmm, it is interesting. ;) – jezrael Jun 03 '16 at 11:58
1

isin is great. Another way using query would be

In [821]: List = ['Jason', 'Molly']

In [822]: df.query('name not in @List')
Out[822]:
                    name  reports  year
Santa         Cruz  Tina       31  2013
Maricopa            Jake        2  2014
Yuma                 Amy        3  2014
Zero
  • 74,117
  • 18
  • 147
  • 154