5

I have a series with True and False and need to find all groups of True. This means that I need to find the start index and end index of neighboring Truevalues.

The following code gives the intended result but is very slow, inefficient and clumsy.

import pandas as pd

def groups(ser):
    g = []

    flag = False
    start = None
    for idx, s in ser.items():
        if flag and not s:
            g.append((start, idx-1))
            flag = False
        elif not flag and s:
            start = idx
            flag = True
    if flag:
        g.append((start, idx))
    return g

if __name__ == "__main__":
    ser = pd.Series([1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1], dtype=bool)
    print(ser)

    g = groups(ser)
    print("\ngroups of True:")
    for start, end in g:
        print("from {} until {}".format(start, end))
    pass

output is:

0      True
1      True
2     False
3     False
4      True
5     False
6     False
7      True
8      True
9      True
10     True
11    False
12     True
13    False
14     True

groups of True:
from 0 until 1
from 4 until 4
from 7 until 10
from 12 until 12
from 14 until 14

There are similar questions out there but non is looking to find the indices of the group starts/ends.

user7431005
  • 3,899
  • 4
  • 22
  • 49

2 Answers2

3

It's common to use cumsum on the negation to check for consecutive blocks. For example:

for _,x in s[s].groupby((1-s).cumsum()):
    print(f'from {x.index[0]} to {x.index[-1]}')

Output:

from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Nice. Seems to work. I'm still struggling to understand how it works. `s[s]` returns you only the True values (in my case 9 items). Now you group it based on the cumulative sum of the negation of the original series (with 15 elements). How does `groupby` works in case you cast it on a series with 9 elements and pass it a series with 15 elements – user7431005 Mar 04 '21 at 14:38
  • 1
    Pandas will align the index of the group series to the series/dataframe to-be-grouped. If you pass numpy array to `groupby`, you would need equal length. – Quang Hoang Mar 04 '21 at 14:40
  • Thanks, it is clearly working thus I'll accept it - EDIT: got it now :-) – user7431005 Mar 04 '21 at 14:47
2

You can use itertools:

In [478]: from operator import itemgetter
     ...: from itertools import groupby

In [489]: a = ser[ser].index.tolist() # Create a list of indexes having `True` in `ser` 

In [498]: for k, g in groupby(enumerate(a), lambda ix : ix[0] - ix[1]):
     ...:     l = list(map(itemgetter(1), g))
     ...:     print(f'from {l[0]} to {l[-1]}')
     ...: 
from 0 to 1
from 4 to 4
from 7 to 10
from 12 to 12
from 14 to 14
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58