Find index of last true value in pandas Series or DataFrame

Question

I'm trying to find the index of the last True value in a pandas boolean Series. My current code looks something like the below. Is there a faster or cleaner way of doing this?

import numpy as np
import pandas as pd
import string

index = np.random.choice(list(string.ascii_lowercase), size=1000)
df = pd.DataFrame(np.random.randn(1000, 2), index=index)
s = pd.Series(np.random.choice([True, False], size=1000), index=index)

last_true_idx_s = s.index[s][-1]
last_true_idx_df = df[s].iloc[-1].name

score 24 · Accepted Answer · edited May 23 '17 at 10:30

24

You can use idxmax what is the same as argmax of Andy Hayden answer:

print s[::-1].idxmax()

Comparing:

These timings are going to be very dependent on the size of s as well as the number (and position) of Trues - thanks.

In [2]: %timeit s.index[s][-1]
The slowest run took 6.92 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 35 µs per loop

In [3]: %timeit s[::-1].argmax()
The slowest run took 6.67 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 126 µs per loop

In [4]: %timeit s[::-1].idxmax()
The slowest run took 6.55 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 127 µs per loop

In [5]: %timeit s[s==True].last_valid_index()
The slowest run took 8.10 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 261 µs per loop

In [6]: %timeit (s[s==True].index.tolist()[-1])
The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 239 µs per loop

In [7]: %timeit (s[s==True].index[-1])
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 227 µs per loop

EDIT:

Next solution:

print s[s==True].index[-1]

EDIT1: Solution

(s[s==True].index.tolist()[-1])

was in deleted answer.

edited May 23 '17 at 10:30

Community

1
1

answered Dec 20 '15 at 20:08

jezrael

822,522
95
1,334
1,252

isn't idxmax the same method? – Andy Hayden Dec 20 '15 at 21:23
What do you think? Why some of them is not removed from pandas? I am curious. – jezrael Dec 20 '15 at 21:27
I think it *used* to be the case that `np.argmax` (and hence `.argmax`) would fall through to the pandas `.values` numpy array i.e. not return a Series. Now `np.argmax` returns a Series. – Andy Hayden Dec 20 '15 at 21:29
Up to you, though I would say for future answers: you should separate your timeit calls (that way it's easier to see which answer is for which call). :) That said, these timings are going to be very dependent on the size of s as well as the number (and position) of Trues. – Andy Hayden Dec 20 '15 at 21:34
Why is there such a large difference between the slowest and fastest runs? – user1507844 Dec 23 '15 at 05:16
1

Be aware that with idxmax the result is incorrect in case your Series doesn't contain any True. – kadee Dec 14 '21 at 08:20

score 11 · Answer 2 · answered Dec 20 '15 at 19:51

11

Use last_valid_index:

In [9]:
s.tail(10)

Out[9]:
h    False
w     True
h    False
r     True
q    False
b    False
p    False
e    False
q    False
d    False
dtype: bool

In [8]:
s[s==True].last_valid_index()

Out[8]:
'r'

answered Dec 20 '15 at 19:51

EdChum

376,765
198
813
562

This is the best solution. I think `last_valid_index` is clearer than `idxmax`. – onewhaleid Aug 16 '18 at 01:52
3

I think this behavior is not intended: "If all elements are non-NA/null, returns None." `False` is non-null. – moi Apr 16 '21 at 14:33

score 4 · Answer 3 · answered Dec 20 '15 at 19:54

4

argmax gets the first True. Use argmax on the reversed Series:

In [11]: s[::-1].argmax()
Out[11]: 'e'

Here:

In [12]: s.tail()
Out[12]:
n     True
e     True
k    False
d    False
l    False
dtype: bool

answered Dec 20 '15 at 19:54

Andy Hayden

359,921
101
625
535

I get timings of `1000 loops, best of 3: 638 µs per loop 1000 loops, best of 3: 284 µs per loop` comparing my method with yours +1 – EdChum Dec 20 '15 at 20:07
@EdChum one thing is a little annoying is that reversing creates a copy (IIUC)... you could drop to the values and use the numpy reversed view which may be slightly faster (but IMO much less readable) as essentially O(1). – Andy Hayden Dec 20 '15 at 21:26
@AndyHayden I guess that it's not immediately obvious why `argmax` works here, still it's quicker which is what usually counts – EdChum Dec 20 '15 at 21:32
Yeah, it is at worst O(n) and short circuits. But it definitely does seem more magical (less descriptive). – Andy Hayden Dec 20 '15 at 21:35
Be aware that with this solution the result is incorrect in case your Series doesn't contain any True. – kadee Dec 14 '21 at 08:21

Find index of last true value in pandas Series or DataFrame

3 Answers3

Linked

Related