Count consecutive occurences of values varying in length in a numpy array

Question

Say I have a bunch of numbers in a numpy array and I test them based on a condition returning a boolean array:

np.random.seed(3456)
a = np.random.rand(8)
condition = a>0.5

And with this boolean array I want to count all of the lengths of consecutive occurences of True. For example if I had [True,True,True,False,False,True,True,False,True] I would want to get back [3,2,1].

I can do that using this code:

length,count = [],0
for i in range(len(condition)):

    if condition[i]==True:
        count += 1
    elif condition[i]==False and count>0:
        length.append(count)
        count = 0

    if i==len(condition)-1 and count>0:
        length.append(count)

    print length

But is there anything already implemented for this or a python,numpy,scipy, etc. function that counts the length of consecutive occurences in a list or array for a given input?

possible duplicate of [Numpy grouping using itertools.groupby performance](http://stackoverflow.com/questions/4651683/numpy-grouping-using-itertools-groupby-performance) — simonzack, Jun 21 '14 at 13:49

score 55 · Answer 1 · answered Jun 21 '14 at 16:02

55

If you already have a numpy array, this is probably going to be faster:

>>> condition = np.array([True,True,True,False,False,True,True,False,True])
>>> np.diff(np.where(np.concatenate(([condition[0]],
                                     condition[:-1] != condition[1:],
                                     [True])))[0])[::2]
array([3, 2, 1])

It detects where chunks begin, has some logic for the first and last chunk, and simply computes differences between chunk starts and discards lengths corresponding to False chunks.

answered Jun 21 '14 at 16:02

Jaime

65,696
17
124
159

I'm passing this function to grouby and resample methods on pandas dataframes so I guess it would ultimately be a numpy array. In this case speed isn't much of an issue but noted in case of much larger datasets – pbreach Jun 21 '14 at 16:21
I found this to be a couple orders magnitude faster for 1e6 bools than the itertools approach. Thanks! – sfjac Feb 08 '18 at 23:06
Nice one, thanks! Do you perhaps have a suggestion how I could adapt your code such that I could do this row-wise on a 2D numpy array? – pr94 Mar 04 '21 at 08:30
1

@pr94 Doing it row wise will likely give different length arrays for each answer. I would guess you would have to do it one row at a time and thus you would just need to loop it some how and add the extra index `[0,:]` instead of `[0]` – goryh Apr 26 '21 at 21:04

score 21 · Accepted Answer · edited Feb 08 '18 at 23:08

21

Here's a solution using itertools (it's probably not the fastest solution):

import itertools
condition = [True,True,True,False,False,True,True,False,True]
[ sum( 1 for _ in group ) for key, group in itertools.groupby( condition ) if key ]

Out:
[3, 2, 1]

edited Feb 08 '18 at 23:08

sfjac

7,119
5
45
69

answered Jun 21 '14 at 13:46

usual me

8,338
10
52
95

Definitely a very pythonic answer! Actually this is much faster than my code snip that I had above. about 0.2s compared to ~1-2 – pbreach Jun 21 '14 at 14:06
It worked... but then it started showing me this error: `The truth value of an array with more than one element is ambiguous`. Out of blue, no idea why. It works in idle, byt not in pycharm – Piotr Kamoda Apr 01 '16 at 08:59
1

This is slightly faster if you use len(list(group)) instead of sum( 1...), but still significantly slower than @Jaime's answer if you already have a numpy array. – sfjac Feb 08 '18 at 23:08

score 3 · Answer 3 · answered Dec 09 '20 at 21:08

You can also count the distance between consecutive False values by looking at the index (result of np.where) of the inverse of your condition array. The trick is ensuring the boolean array starts with a False. Basically, you're counting the distance between the boundaries between your True conditions.

condition = np.array([True, True, True, False, False, True, True, False, True, False])
if condition[0]:
    condition = np.concatenate([[False], condition])

idx = np.where(~condition)[0]

At the final step, you need to 1 from these values so you remove both the left and right edges.

>>> np.ediff1d(idx) - 1
array([3, 0, 2, 1])

score 2 · Answer 4 · answered Nov 03 '22 at 10:08

2

np.unique((~arr).cumsum()[arr], return_counts=True)[1]

answered Nov 03 '22 at 10:08

theodosis

894
5
15

score 0 · Answer 5 · answered Nov 24 '21 at 03:54

If t is the np array and it is sorted in ascending order, then:

d=np.diff(t)
d_incr = np.argwhere(d>0).flatten()
d_incr = np.insert(d_incr, 0, 0)

The np array d_incr will contain the indices where a change occured, allowing one to perform operations on groups of values between d_incr[i-1] and d_incr[i] for i in range(1,d_incr.size)

Count consecutive occurences of values varying in length in a numpy array

5 Answers5

Linked

Related