Group consecutive True in 1-D numpy array

Question

Suppose we have a boolean array x=np.array([True, True, False, True, False]). There are two consecutive group of True. What I want is to create a list of boolean arrays l where each array in l contains exactly one set of consecutive True. For instance, x should be identical to y defined by

y = np.zeros_like(x)
for e in l:
    y = y|e

So far my only successful attempt on this is using the consecutive function by https://stackoverflow.com/a/7353335/4755229

def consecutive_bools(bool_input):
    consecutive_idx = consecutive(np.argwhere(bool_input).flatten())
    ret = [np.zeros_like(bool_input) for i in range(len(consecutive_idx))]
    for i, idx in enumerate(consecutive_idx):
        ret[i][idx] = True
    return ret

This seems overly complicated. Is there any better (concise, and possibly faster) way of doing this?

This list seems like an extremely memory-intensive, slow-to-generate representation of the information it would carry. — user2357112, May 01 '23 at 12:11
@user2357112 The reason I want this kind of result is because I need to index arrays based on this. Basically, I have a long 1-D data, and I need to find sub-arrays of this based on several different criteria. I have a long list of boolean arrays of identical shapes that are constructed based on different criteria, and I could do `&` or `|` operations between different combinations of these arrays to make specific "condition" array on demand. I could, at some point, make an array of unsigned integers to represent bitmasks... — Hojin Cho, May 01 '23 at 12:22

Ben Grossmann · Answer 1 · 2023-05-01T13:08:43.480

Consider the following:

import numpy as np

x=np.array([True, True, False, True, False])

idx, = np.where(np.insert(x,0,False) ^ np.insert(x,-1,False))

l = [np.zeros_like(x),np.zeros_like(x)]
l[0][idx[0]:idx[1]] = True
l[1][idx[2]:idx[3]] = True

The idea here is that the elements of idx are the indices of any switch from True to False or vice versa. Since you have exactly 2 consecutive groups of True, idx will have exactly 4 elements.

For an arbitrary number of consecutive groups:

idx, = np.where(np.insert(x,0,False) ^ np.insert(x,-1,False))

l = [np.zeros_like(x) for _ in range(len(idx)//2)]
for a,p in zip(l,np.split(idx,np.arange(2,len(idx),2))):
    a[slice(*p)] = True

Mechanic Pig · Accepted Answer · 2023-05-01T16:33:11.177

An interesting method is to construct the start and stop of each segment, and then construct an array through np.arange(x.size). Compare it and all starts with >=, and compare it and all stops with <. The logical and of the two results yields the desired output:

def my_consecutive_bools(ar):
    indices, = np.concatenate([ar[:1], ar[:-1] != ar[1:], ar[-1:]]).nonzero()
    arange = np.arange(ar.size)
    return np.logical_and(arange >= indices[::2, None],
                          arange < indices[1::2, None])

>>> x = np.array([True, True, False, True, False])
>>> my_consecutive_bools(x)
array([[ True,  True, False, False, False],
       [False, False, False,  True, False]])

This method works well on some small arrays, but its time complexity is high. For large arrays, you can simply iterate over start and stop to assign values:

def my_consecutive_bools_loop(ar):
    indices, = np.concatenate([ar[:1], ar[:-1] != ar[1:], ar[-1:]]).nonzero()
    result = np.zeros((indices.size // 2, ar.size), bool)
    for row, start, stop in zip(result, indices[::2], indices[1::2]):
        row[start:stop] = True
    return result

Simple benchmark:

In [_]: rng = np.random.default_rng()

In [_]: small = rng.choice([True, False], 100, p=[0.8, 0.2])

In [_]: big = rng.choice([True, False], 100000, p=[0.8, 0.2])

In [_]: %timeit consecutive_bools(small)
109 µs ± 286 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [_]: %timeit my_consecutive_bools(small)
13.3 µs ± 46.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [_]: %timeit my_consecutive_bools_loop(small)
20 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [_]: %timeit consecutive_bools(big)
699 ms ± 6.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [_]: %timeit my_consecutive_bools(big)
2.98 s ± 17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [_]: %timeit my_consecutive_bools_loop(big)
33.4 ms ± 1.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Alain T. · Answer 3 · 2023-06-03T15:47:09.433

This could be achieved by creating an array with each True streak assigned a distinct group identifier. Using a cumulative sum of the inverted True/False can produce group numbers that remain the same for consecutive True positions. Intersecting this with the original array will place the False position in group zero (which we can exclude). By comparing the unique group numbers to the sequence of groups (in a broadcast), the expected matrix of True streaks is obtained:

import numpy as np
    
x=np.array([True, True, False, True, False])

groups = (~x).cumsum() + 1
trues  = np.unique(groups[x])[:,None] == groups * x

print(trues)

[[ True  True False False False]
 [False False False  True False]]

Works with any number of groups (which you don't need to know in advance):

x=np.array([False,True,True,True,False,True,False,False,True,True,True])

groups = (~x).cumsum() + 1
trues  = np.unique(groups[x])[:,None] == groups * x

print(trues)

[[False  True  True  True False False False False False False False]
 [False False False False False  True False False False False False]
 [False False False False False False False False  True  True  True]]

This should be much faster than any solution involving loops. It is a bit more concise as well.

Group consecutive True in 1-D numpy array

3 Answers3