19

I'm trying to find the index of the last True value in a pandas boolean Series. My current code looks something like the below. Is there a faster or cleaner way of doing this?

import numpy as np
import pandas as pd
import string

index = np.random.choice(list(string.ascii_lowercase), size=1000)
df = pd.DataFrame(np.random.randn(1000, 2), index=index)
s = pd.Series(np.random.choice([True, False], size=1000), index=index)

last_true_idx_s = s.index[s][-1]
last_true_idx_df = df[s].iloc[-1].name
user1507844
  • 5,973
  • 10
  • 38
  • 55

3 Answers3

24

You can use idxmax what is the same as argmax of Andy Hayden answer:

print s[::-1].idxmax()

Comparing:

These timings are going to be very dependent on the size of s as well as the number (and position) of Trues - thanks.

In [2]: %timeit s.index[s][-1]
The slowest run took 6.92 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 35 µs per loop

In [3]: %timeit s[::-1].argmax()
The slowest run took 6.67 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 126 µs per loop

In [4]: %timeit s[::-1].idxmax()
The slowest run took 6.55 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 127 µs per loop

In [5]: %timeit s[s==True].last_valid_index()
The slowest run took 8.10 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 261 µs per loop

In [6]: %timeit (s[s==True].index.tolist()[-1])
The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 239 µs per loop

In [7]: %timeit (s[s==True].index[-1])
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 227 µs per loop

EDIT:

Next solution:

print s[s==True].index[-1]

EDIT1: Solution

(s[s==True].index.tolist()[-1])

was in deleted answer.

Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • isn't idxmax the same method? – Andy Hayden Dec 20 '15 at 21:23
  • What do you think? Why some of them is not removed from pandas? I am curious. – jezrael Dec 20 '15 at 21:27
  • I think it *used* to be the case that `np.argmax` (and hence `.argmax`) would fall through to the pandas `.values` numpy array i.e. not return a Series. Now `np.argmax` returns a Series. – Andy Hayden Dec 20 '15 at 21:29
  • Up to you, though I would say for future answers: you should separate your timeit calls (that way it's easier to see which answer is for which call). :) That said, these timings are going to be very dependent on the size of s as well as the number (and position) of Trues. – Andy Hayden Dec 20 '15 at 21:34
  • Why is there such a large difference between the slowest and fastest runs? – user1507844 Dec 23 '15 at 05:16
  • 1
    Be aware that with idxmax the result is incorrect in case your Series doesn't contain any True. – kadee Dec 14 '21 at 08:20
11

Use last_valid_index:

In [9]:
s.tail(10)

Out[9]:
h    False
w     True
h    False
r     True
q    False
b    False
p    False
e    False
q    False
d    False
dtype: bool

In [8]:
s[s==True].last_valid_index()

Out[8]:
'r'
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • This is the best solution. I think `last_valid_index` is clearer than `idxmax`. – onewhaleid Aug 16 '18 at 01:52
  • 3
    I think this behavior is not intended: "If all elements are non-NA/null, returns None." `False` is non-null. – moi Apr 16 '21 at 14:33
4

argmax gets the first True. Use argmax on the reversed Series:

In [11]: s[::-1].argmax()
Out[11]: 'e'

Here:

In [12]: s.tail()
Out[12]:
n     True
e     True
k    False
d    False
l    False
dtype: bool
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • I get timings of `1000 loops, best of 3: 638 µs per loop 1000 loops, best of 3: 284 µs per loop` comparing my method with yours +1 – EdChum Dec 20 '15 at 20:07
  • @EdChum one thing is a little annoying is that reversing creates a copy (IIUC)... you could drop to the values and use the numpy reversed view which may be slightly faster (but IMO much less readable) as essentially O(1). – Andy Hayden Dec 20 '15 at 21:26
  • @AndyHayden I guess that it's not immediately obvious why `argmax` works here, still it's quicker which is what usually counts – EdChum Dec 20 '15 at 21:32
  • Yeah, it is at worst O(n) and short circuits. But it definitely does seem more magical (less descriptive). – Andy Hayden Dec 20 '15 at 21:35
  • Be aware that with this solution the result is incorrect in case your Series doesn't contain any True. – kadee Dec 14 '21 at 08:21