Slice Pandas dataframe by index values that are (not) in a list

Question

I have a pandas dataframe, df.

I want to select all indices in df that are not in a list, blacklist.

Now, I use list comprehension to create the desired labels to slice.

ix=[i for i in df.index if i not in blacklist]  
df_select=df.loc[ix]

Works fine, but may be clumsy if I need to do this often.

Is there a better way to do this?

Possible duplicate of [dropping rows from dataframe based on a "not in" condition](https://stackoverflow.com/questions/27965295/dropping-rows-from-dataframe-based-on-a-not-in-condition) — Jim G., Sep 11 '19 at 18:34

score 169 · Accepted Answer · edited Jul 07 '17 at 14:55

169

Use isin on the index and invert the boolean index to perform label selection:

In [239]:

df = pd.DataFrame({'a':np.random.randn(5)})
df
Out[239]:
          a
0 -0.548275
1 -0.411741
2 -1.187369
3  1.028967
4 -2.755030
In [240]:

t = [2,4]
df.loc[~df.index.isin(t)]
Out[240]:
          a
0 -0.548275
1 -0.411741
3  1.028967

edited Jul 07 '17 at 14:55

Hooked

84,485
43
192
261

answered Mar 19 '15 at 08:47

EdChum

376,765
198
813
562

score 24 · Answer 2 · answered Mar 18 '15 at 23:44

You could use set() to create the difference between your original indices and those that you want to remove:

df.loc[set(df.index) - set(blacklist)]

It has the advantage of being parsimonious, as well as being easier to read than a list comprehension.

Hagrid67 · Answer 3 · 2016-12-06T19:29:15.800

4

Thanks to ASGM; I found that I needed to turn the set into a list to make it work with a MultiIndex:

mi1 = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), ("b", 2)])
df1 = pd.DataFrame(data={"aaa":[1,2,3,4]}, index=mi1)
setValid = set(df1.index) - set([("a", 2)])
df1.loc[list(setValid)] # works
df1.loc[setValid] # fails

(sorry can't comment, insufficient rep)

edited Dec 06 '16 at 19:29

answered Dec 06 '16 at 18:34

Hagrid67

344
2
9

score 4 · Answer 4 · edited Dec 04 '19 at 04:25

4

If you are looking for a way to select all rows that are outside a condition you can use np.invert() given that the condition returns an array of booleans.

df.loc[np.invert(({condition 1}) & (condition 2))]

edited Dec 04 '19 at 04:25

zmag

7,825
12
32
42

answered Dec 04 '19 at 04:03

Hector Garcia L

41
2

score 4 · Answer 5 · edited Oct 22 '20 at 08:51

4

df = pd.DataFrame(data=[5,6,7,8], index=[1,2,3,4], columns=['D',])
blacklist = [2,3]

df.drop(blacklist,0)

edited Oct 22 '20 at 08:51

4b0

21,981
30
95
142

answered Oct 22 '20 at 08:50

George Xiong

41
1

7

Code-only answers are not particularly helpful. Please include a brief description of how this code solves the problem. – 4b0 Oct 22 '20 at 08:52
Please don't post only code as answer, but also provide an explanation what your code does and how it solves the problem of the question. Answers with an explanation are usually more helpful and of better quality, and are more likely to attract upvotes. – ZF007 Oct 22 '20 at 10:38
This, in my opinion, is a lot prettier/more elegant than doing `df.loc[~df.index.isin(blacklist)]`. However, it is less interpretable, since usually people only use `drop` to remove columns (so `axis=1`). – Corey Levinson Feb 05 '23 at 17:37

score 4 · Answer 6 · answered Mar 16 '22 at 12:29

4

You could use difference() to obtain the difference between your original indices and those that you want to exclude:

df.loc[df.index.difference(blacklist), :]

It has the advantage of being easier to read.

answered Mar 16 '22 at 12:29

Alexander Martins

343
1
11

Dyno Fu · Answer 7 · 2015-03-19T00:05:10.750

import pandas as pd
df = pd.DataFrame(data=[5,6,7,8], index=[1,2,3,4], columns=['D',])
blacklist = [2,3]
#your current way ...
ix=[i for i in df.index if i not in blacklist]  
df_select=df.loc[ix]

# use a mask
mask = [True if x else False for x in df.index if x not in blacklist]
df.loc[mask]

http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-label actually, loc and iloc both take a boolean array, in this case the mask. from now on you can reuse this mask and should be more efficient.

rachwa · Answer 8 · 2022-06-21T19:45:52.280

1

Inside query you can access your variable blacklist using @:

df.query('index != @blacklist')

# Or alternatively:
df.query('index not in @blacklist')

edited Jun 21 '22 at 19:45

answered May 07 '22 at 13:32

rachwa

1,805
1
14
17

score 0 · Answer 9 · edited Sep 27 '21 at 12:41

0

You can use the np.setdiff1d function which finds the set difference of two arrays.

index = np.array(blacklist)
not_index = np.setdiff1d(df.index.to_numpy(), index)
df.iloc[not_index]

edited Sep 27 '21 at 12:41

Dharman

30,962
25
85
135

answered Sep 27 '21 at 12:36

hamnghi

99
1
7

Slice Pandas dataframe by index values that are (not) in a list

9 Answers9

Linked

Related