Remove rows not .isin('X')

Question

Sorry just getting into Pandas, this seems like it should be a very straight forward question. How can I use the isin('X') to remove rows that are in the list X? In R I would write !which(a %in% b).

Jonny Brooks · Answer 1 · 2020-04-17T16:59:37.070

156

You have many options. Collating some of the answers above and the accepted answer from this post you can do:
1. df[-df["column"].isin(["value"])]
2. df[~df["column"].isin(["value"])]
3. df[df["column"].isin(["value"]) == False]
4. df[np.logical_not(df["column"].isin(["value"]))]

Note: for option 4 for you'll need to import numpy as np

Update: You can also use the .query method for this too. This allows for method chaining:
5. df.query("column not in @values").
where values is a list of the values that you don't want to include.

edited Apr 17 '20 at 16:59

answered Feb 09 '17 at 09:52

Jonny Brooks

3,169
3
19
22

What is the difference between `~` and `-`? Is this pandas-specific? – stragu Jan 27 '21 at 07:41
2

@stragu I don't think this is Pandas-specific. The `~` [is a bitwise operation](https://stackoverflow.com/a/46054354/4543854) which in this case leads to the same result as using `-`. But Unfortunately, I don't know enough about Bitwise operators to give an in-depth answer to your question – Jonny Brooks Feb 01 '21 at 11:49
Time profiling/scaling..? – jtlz2 Apr 19 '22 at 10:34

score 75 · Answer 2 · edited Jan 22 '19 at 04:51

75

You can use numpy.logical_not to invert the boolean array returned by isin:

In [63]: s = pd.Series(np.arange(10.0))

In [64]: x = range(4, 8)

In [65]: mask = np.logical_not(s.isin(x))

In [66]: s[mask]
Out[66]: 
0    0
1    1
2    2
3    3
8    8
9    9

As given in the comment by Wes McKinney you can also use

s[~s.isin(x)]

edited Jan 22 '19 at 04:51

cs95

379,657
97
704
746

answered Dec 27 '12 at 17:46

bmu

35,119
13
91
108

score 29 · Answer 3 · answered Nov 11 '15 at 01:44

29

All you have to do is create a subset of your dataframe where the isin method evaluates to False:

df = df[df['Column Name'].isin(['Value']) == False]

answered Nov 11 '15 at 01:44

atm

1,684
1
22
24

score 5 · Answer 4 · edited Jul 12 '17 at 08:46

5

You can use the DataFrame.select method:

In [1]: df = pd.DataFrame([[1,2],[3,4]], index=['A','B'])

In [2]: df
Out[2]: 
   0  1
A  1  2
B  3  4

In [3]: L = ['A']

In [4]: df.select(lambda x: x in L)
Out[4]: 
   0  1
A  1  2

edited Jul 12 '17 at 08:46

a_guest

34,165
12
64
118

answered Dec 27 '12 at 15:40

Andy Hayden

359,921
101
625
535

Thanks Hayden, sorry I had a typo in my question, I wanted to select those which are not in A, so something that I could know A, and it would give me back B instead. – DrewH Dec 27 '12 at 17:43

Remove rows not .isin('X')

4 Answers4

Linked