Apply condition on pandas columns to create a boolen indexing array

Question

I want to drop specific rows from a pandas dataframe. Usually you can do that using something like

df[df['some_column'] != 1234]

What df['some_column'] != 1234 does is creating an indexing array that is indexing the new df, thus letting only rows with value True to be present.

But in some cases, like mine, I don't see how I can express the condition in such a way, and iterating over pandas rows is way too slow to be considered a viable option.

To be more specific, I want to drop all rows where the value of a column is also a key in a dictionary, in a similar manner with the example above.

In a perfect world I would consider something like

df[df['some_column'] not in my_dict.keys()]

Which is obviously not working. Any suggestions?

In your pseudo code example, did you mean `... not in my_dict]`? — Mad Physicist, Aug 02 '16 at 20:11
Possible duplicate of [How to implement 'in' and 'not in' for Pandas dataframe](http://stackoverflow.com/questions/19960077/how-to-implement-in-and-not-in-for-pandas-dataframe) — root, Aug 02 '16 at 20:19

Saurav Gupta · Accepted Answer · 2016-08-02T23:27:21.147

2

What you're looking for is isin()

import pandas as pd

df = pd.DataFrame([[1, 2], [1, 3], [4, 6],[5,7],[8,9]], columns=['A', 'B'])
In[9]: df
Out[9]: df
   A  B
0  1  2
1  1  3
2  4  6
3  5  7
4  8  9
mydict = {1:'A',8:'B'}
df[df['A'].isin(mydict.keys())]
Out[11]: 
   A  B
0  1  2
1  1  3
4  8  9

edited Aug 02 '16 at 23:27

answered Aug 02 '16 at 20:24

Saurav Gupta

535
3
11

4

`.isin(mydict)` works just fine on its own, you don't need to specify keys. – Alexander Aug 02 '16 at 20:31

score 1 · Answer 2 · answered Aug 02 '16 at 20:24

1

You could use query for this purpose:

df.query('some_column != list(my_dict.keys()')

answered Aug 02 '16 at 20:24

Nickil Maveli

29,155
8
82
85

Jose Raul Barreras · Answer 3 · 2016-08-02T20:32:42.597

1

You can use the function isin() to select rows whose column value is in an iterable.

Using lists:

my_list = ['my', 'own', 'data']
df.loc[df['column'].isin (my_list)]

Using dicts:

my_dict = {'key1':'Some value'}
df.loc[df['column'].isin (my_dict.keys())]

edited Aug 02 '16 at 20:32

answered Aug 02 '16 at 20:25

Jose Raul Barreras

849
1
13
19

Apply condition on pandas columns to create a boolen indexing array

3 Answers3

Using lists:

Using dicts: