Getting a list of indices where pandas boolean series is True

Question

I have a pandas series with boolean entries. I would like to get a list of indices where the values are True.

For example the input pd.Series([True, False, True, True, False, False, False, True])

should yield the output [0,2,3,7].

I can do it with a list comprehension, but is there something cleaner or faster?

A better testcase is `s = pd.Series([True, False, True, True, False, False, False, True], index=list('ABCDEFGH'))`. Expected output: `Index(['A', 'C', 'D', 'H'], ...)`. Since some solutions (esp. all the np functions) drop the index and use the autonumber index. — smci, Apr 21 '21 at 22:42
...if we have a named index, it's usually very undesirable to drop it. — smci, Apr 21 '21 at 22:57

score 168 · Answer 1 · edited Jun 18 '23 at 22:37

168

Using boolean indexing

>>> s = pd.Series([True, False, True, True, False, False, False, True])
>>> s[s].index
Int64Index([0, 2, 3, 7], dtype='int64')

If need a np.array object, get the .values

>>> s[s].index.values
array([0, 2, 3, 7])

Using `np.nonzero`

>>> np.nonzero(s)
(array([0, 2, 3, 7]),)

Using `np.flatnonzero`

>>> np.flatnonzero(s)
array([0, 2, 3, 7])

Using `np.where`

>>> np.where(s)[0]
array([0, 2, 3, 7])

Using `np.argwhere`

>>> np.argwhere(s).ravel()
array([0, 2, 3, 7])

Using `pd.Series.index`

>>> s.index[s]
array([0, 2, 3, 7])

Using Python's built-in `filter`

>>> [*filter(s.get, s.index)]
[0, 2, 3, 7]

Using list comprehension

>>> [i for i in s.index if s[i]]
[0, 2, 3, 7]

edited Jun 18 '23 at 22:37

wjandrea

28,235
9
60
81

answered Sep 04 '18 at 19:53

rafaelc

57,686
15
58
82

5

what if the series indices has label instead index-range? – Pyd Nov 27 '19 at 17:37
@pyd then you can use options referred to in the answer as `Boolean Indexing`, `pd.Series.index`. `filter` and `list comprehension` — basically NOT the numpy ones – Dahn Apr 14 '20 at 07:27
@Dahn I did not understand your answer. Can you explain further? – MattS Apr 26 '20 at 13:17
@MattS If the series have index **other than** range index, then any methods listed in `rafaelc`'s answer that are based on numpy won't' work, as numpy will forget the indices upon conversion. I therefore listed the methods that do still work in that case. Does that work for you? – Dahn Apr 27 '20 at 06:40
I think we should also mention here `.where()` method. Check here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html – iedmrc Aug 03 '20 at 08:54
TIMTOWTDI FTW!! – Mustafa Aydın Sep 01 '20 at 18:24

score 27 · Answer 2 · edited Jun 18 '23 at 22:45

As an addition to rafaelc's answer, here are the according times (from quickest to slowest) for the following setup

import numpy as np
import pandas as pd
s = pd.Series([x > 0.5 for x in np.random.random(size=1000)])

Using `np.where`

>>> timeit np.where(s)[0]
12.7 µs ± 77.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Using `np.flatnonzero`

>>> timeit np.flatnonzero(s)
18 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Using `pd.Series.index`

The time difference to boolean indexing was really surprising to me, since the boolean indexing is usually more used.

>>> timeit s.index[s]
82.2 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Using boolean indexing

>>> timeit s[s].index
1.75 ms ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If you need a np.array object, get the .values

>>> timeit s[s].index.values
1.76 ms ± 3.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If you need a slightly easier to read version <-- not in original answer

>>> timeit s[s==True].index
1.89 ms ± 3.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Using `pd.Series.where` <-- not in original answer

>>> timeit s.where(s).dropna().index
2.22 ms ± 3.32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> timeit s.where(s == True).dropna().index
2.37 ms ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using `pd.Series.mask` <-- not in original answer

>>> timeit s.mask(s).dropna().index
2.29 ms ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> timeit s.mask(s == True).dropna().index
2.44 ms ± 5.82 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using list comprehension

>>> timeit [i for i in s.index if s[i]]
13.7 ms ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using Python's built-in `filter`

>>> timeit [*filter(s.get, s.index)]
14.2 ms ± 28.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using `np.nonzero` <-- did not work out of the box for me

>>> timeit np.nonzero(s)
ValueError: Length of passed values is 1, index implies 1000.

Using `np.argwhere` <-- did not work out of the box for me

>>> timeit np.argwhere(s).ravel()
ValueError: Length of passed values is 1, index implies 1000.

score 3 · Answer 3 · answered Nov 20 '21 at 22:20

3

Also works: s.where(lambda x: x).dropna().index, and it has the advantage of being easy to chain pipe - if your series is being computed on the fly, you don't need to assign it to a variable.

Note that if s is computed from r: s = cond(r) than you can also use: r.where(lambda x: cond(x)).dropna().index.

answered Nov 20 '21 at 22:20

tsvikas

16,004
1
22
12

*"it has the advantage of being easy to chain"* -- You can pass a function as an indexer, so this works: `s[lambda x: x].index` – wjandrea Jun 18 '23 at 22:47

Ynjxsjmh · Answer 4 · 2022-11-09T11:07:18.183

1

You can use pipe or loc to chain the operation, this is helpful when s is an intermediate result and you don't want to name it.

s = pd.Series([True, False, True, True, False, False, False, True], index=list('ABCDEFGH'))

out = s.pipe(lambda s_: s_[s_].index)
# or
out = s.pipe(lambda s_: s_[s_]).index
# or
out = s.loc[lambda s_: s_].index

print(out)

Index(['A', 'C', 'D', 'H'], dtype='object')

edited Nov 09 '22 at 11:07

answered Sep 24 '22 at 12:01

Ynjxsjmh

28,441
6
34
52

Regular indexing works: `s[lambda s_: s_].index` – wjandrea Jun 18 '23 at 22:53

Getting a list of indices where pandas boolean series is True

4 Answers4

Using boolean indexing

Using `np.nonzero`

Using `np.flatnonzero`

Using `np.where`

Using `np.argwhere`

Using `pd.Series.index`

Using Python's built-in `filter`

Using list comprehension

Using `np.where`

Using `np.flatnonzero`

Using `pd.Series.index`

Using boolean indexing

Using `pd.Series.where` <-- not in original answer

Using `pd.Series.mask` <-- not in original answer

Using list comprehension

Using Python's built-in `filter`

Using `np.nonzero` <-- did not work out of the box for me

Using `np.argwhere` <-- did not work out of the box for me

Linked

Related

Getting a list of indices where pandas boolean series is True

4 Answers4

Using boolean indexing

Using np.nonzero

Using np.flatnonzero

Using np.where

Using np.argwhere

Using pd.Series.index

Using Python's built-in filter

Using list comprehension

Using np.where

Using np.flatnonzero

Using pd.Series.index

Using boolean indexing

Using pd.Series.where <-- not in original answer

Using pd.Series.mask <-- not in original answer

Using list comprehension

Using Python's built-in filter

Using np.nonzero <-- did not work out of the box for me

Using np.argwhere <-- did not work out of the box for me

Linked

Related

Using `np.nonzero`

Using `np.flatnonzero`

Using `np.where`

Using `np.argwhere`

Using `pd.Series.index`

Using Python's built-in `filter`

Using `np.where`

Using `np.flatnonzero`

Using `pd.Series.index`

Using `pd.Series.where` <-- not in original answer

Using `pd.Series.mask` <-- not in original answer

Using Python's built-in `filter`

Using `np.nonzero` <-- did not work out of the box for me

Using `np.argwhere` <-- did not work out of the box for me