How to count longest uninterrupted sequence in pandas

Question

Let's say I have pd.Series like below

s = pd.Series([False, True, False,True,True,True,False, False])    

0    False
1     True
2    False
3     True
4     True
5     True
6    False
7    False
dtype: bool

I want to know how long is the longest True sequence, in this example, it is 3.

I tried it in a stupid way.

s_list = s.tolist()
count = 0
max_count = 0
for item in s_list:
    if item:
        count +=1
    else:
        if count>max_count:
            max_count = count
        count = 0
print(max_count)

It will print 3, but in a Series of all True, it will print 0

score 27 · Accepted Answer · edited Feb 22 '18 at 19:06

Option 1
Use a the series itself to mask the cumulative sum of the negation. Then use value_counts

(~s).cumsum()[s].value_counts().max()

3

explanation

(~s).cumsum() is a pretty standard way to produce distinct True/False groups
```
0    1
1    1
2    2
3    2
4    2
5    2
6    3
7    4
dtype: int64
```
But you can see that the group we care about is represented by the 2s and there are four of them. That's because the group is initiated by the first False (which becomes True with (~s)). Therefore, we mask this cumulative sum with the boolean mask we started with.
```
(~s).cumsum()[s]

1    1
3    2
4    2
5    2
dtype: int64
```
Now we see the three 2s pop out and we just have to use a method to extract them. I used value_counts and max.

Option 2
Use factorize and bincount

a = s.values
b = pd.factorize((~a).cumsum())[0]
np.bincount(b[a]).max()

3

explanation
This is a similar explanation as for option 1. The main difference is in how I a found the max. I use pd.factorize to tokenize the values into integers ranging from 0 to the total number of unique values. Given the actual values we had in (~a).cumsum() we didn't strictly need this part. I used it because it's a general purpose tool that could be used on arbitrary group names.

After pd.factorize I use those integer values in np.bincount which accumulates the total number of times each integer is used. Then take the maximum.

Option 3
As stated in the explanation of option 2, this also works:

a = s.values
np.bincount((~a).cumsum()[a]).max()

3

@piRSquared Adding a python groupby :-) cheers :-), learn a lot from yours , thank you Sir ! — BENY, Feb 21 '18 at 02:53
@piRSquared, thanks and learnt a new trick of using (~a).cumsum() — Allen Qin, Feb 21 '18 at 03:29
@piRSquared, how do I know where the longest sequence of trues occur? — Bernardo Trindade, Jan 06 '21 at 22:13
How could this code be modified to find the longest uninterrupted sequence of, say, a specific integer value? — cjstevens, Jun 02 '21 at 09:57

BENY · Answer 2 · 2018-02-21T04:38:50.790

5

I think this could work

pd.Series(s.index[~s].values).diff().max()-1
Out[57]: 3.0

Also outside pandas' we can back to python groupby

from itertools import groupby
max([len(list(group)) for key, group in groupby(s.tolist())])
Out[73]: 3

Update :

from itertools import compress
max(list(compress([len(list(group)) for key, group in groupby(s.tolist())],[key for key, group in groupby(s.tolist())])))
Out[84]: 3

edited Feb 21 '18 at 04:38

answered Feb 21 '18 at 02:45

BENY

317,841
20
164
234

This is very clean. – Tai Feb 21 '18 at 02:52
@wen, nice use of s.index[~s] – Allen Qin Feb 21 '18 at 03:40
Maybe I need to pay more time to learn python standard library. – Dawei Feb 21 '18 at 04:10
If all element is `False` it will return `8`, so the code should be `max([len(list(group)) for key, group in groupby(s.tolist()) if key])` – Dawei Feb 21 '18 at 04:21

Tai · Answer 3 · 2018-02-21T04:02:21.680

2

Edit: As piRSquared mentioned, my previous solution needs to append two False at the beginning and at the end of the series. piRSquared kindly gave an answer based on that.

(np.diff(np.flatnonzero(np.append(True, np.append(~s.values, True)))) - 1).max()

My original trial is

(np.diff(s.where(~s).dropna().index.values) - 1).max()

(This will not give the correct answer if the longest True starts at the beginning or ends at the end as pointed out by piRSquared. Please use the solution above given by piRSquared. This work remains only for explanation.)

Explanation:

This finds the indices of the False parts and by finding the gaps between the indices of False, we can know the longest True.

s.where(s == False).dropna().index.values finds all the indices of False
```
array([0, 2, 6, 7])
```

We know that Trues live between the Falses. Thus, we can use np.diff to find the gaps between these indices.

    array([2, 4, 1])

Minus 1 in the end as Trues lies between these indices.
Find the maximum of the difference.

edited Feb 21 '18 at 04:02

answered Feb 21 '18 at 02:40

Tai

7,684
3
29
49

1

Umm nice solution – BENY Feb 21 '18 at 02:46
1

Agreed this is nice. However, if you have the longest `True` sequence at the beginning or the end of the array, your diff will not catch it. You need to append `False` to the ends, then do it. Also, you don't need `s == False`, `~s` will do. – piRSquared Feb 21 '18 at 02:53
1

This is how I would have done it. Feel free to add it to your answer as it is the same concept, only if you want to (-: `(np.diff(np.flatnonzero(np.append(True, np.append(~s.values, True)))) - 1).max()` Though I'd suggest formatting nicer. – piRSquared Feb 21 '18 at 02:56
1

@piRSquared thank you for offering the solution to this. I appreciate it. – Tai Feb 21 '18 at 03:01

Allen Qin · Answer 4 · 2018-02-21T03:30:50.620

2

You can use (inspired by @piRSquared answer):

s.groupby((~s).cumsum()).sum().max()
Out[513]: 3.0

Another option to use a lambda func to do this.

s.to_frame().apply(lambda x: s.loc[x.name:].idxmin() - x.name, axis=1).max()
Out[429]: 3

edited Feb 21 '18 at 03:30

answered Feb 21 '18 at 02:58

Allen Qin

19,507
8
51
67

score 2 · Answer 5 · answered Feb 21 '18 at 05:29

2

Your code was actually very close. It becomes perfect with a minor fix:

count = 0
maxCount = 0
for item in s:
    if item:
        count += 1
        if count > maxCount:
            maxCount = count
    else:
        count = 0
print(maxCount)

answered Feb 21 '18 at 05:29

FatihAkici

4,679
2
31
48

score 1 · Answer 6 · answered Feb 21 '18 at 02:34

1

I'm not exactly sure how to do it with pandas but what about using itertools.groupby?

>>> import pandas as pd
>>> s = pd.Series([False, True, False,True,True,True,False, False])
>>> max(sum(1 for _ in g) for k, g in groupby(s) if k)
3

answered Feb 21 '18 at 02:34

G_M

3,342
1
9
23

How to count longest uninterrupted sequence in pandas

6 Answers6

Linked