Python/Numpy/Boolean Indexing: Modify boolean array at locations with N consecutive True values

Question

Given a numpy boolean array

arr = np.array([1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])

I would like to indicate where there are at least n consecutive true values (from left to right).

For n = 2:

#         True 2x (or more) in a row
#            /  \        /     \
arr = [1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1]

#                 becomes:

res = [0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0]
#               ^-----------^--^-------- A pattern of 2 or more consecutive True's ends at each of these locations

For n = 3:

#          True 3x (or more) in a row
#                        /     \
arr = [1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1]

#                 becomes:

res = [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]
#                              ^-------- A pattern of 3 or more consecutive True's ends at this location

Is there a pythonic approach to this without using a for loop to iterate over each element? Performance is important here as my application will perform this operation 1000's of times on boolean arrays containing 1000's of elements.

Notes worth mentioning:

n can be any value above 2
n-consecutive patterns can appear anywhere in the array; beginning, middle or end.
The shape of the result array must match that of the original array.

Any help would be very much appreciated.

Benchmarks on answers

fully vectorized by alain-t:
10000 loops, best of 5: 0.138 seconds per loop, worse of 5: 0.149 seconds per loop

pad/shift by mozway:
10000 loops, best of 5: 1.62 seconds per loop, worse of 5: 1.71 seconds per loop

sliding_window_view by kubatucka (with padding by mozway):
10000 loops, best of 5: 1.15 seconds per loop, worse of 5: 1.54 seconds per loop

Alain T. · Accepted Answer · 2021-11-09T13:29:25.837

2

You can multiply each element with its predecessor. This will give you 1s for sequences of two or more. Do it again on the result to get sequences of 3 or more:

import numpy as np
arr = np.array([1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])

arr[1:] *= arr[:-1]   # two consecutive
arr[0]  = 0           # 1st '1' isn't a two-consec.
arr[1:] *= arr[:-1]   # three consecutive 

print(arr)
[0 0 0 0 0 0 0 0 1 0 0]

You may also try it this way (which is a bit faster):

import numpy as np
arr = np.array([1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])

arr[2:] *= arr[:-2] * arr[1:-1]
arr[:2] = 0

print(arr)
[0 0 0 0 0 0 0 0 1 0 0]

A generalized solution to n-consecutives could not use the second approach but can be performed in a loop using the 1st one:

n = 3
arr[1:]  *= arr[:-1]
arr[:n-1] = 0
for i in range(n-2):
    arr[1:] *= arr[:-1]

Note that this loses some of the benefits of vectorization.

For a fully vectorized n-consecutive approach, you could select the matching differences between the running maximum of 0 positions with all item positions. The 1 positions of the result will contain the number of consecutive 1s at that position (and the 0 positions will be zero)

n = 3
i = np.arange(arr.size)+1
mask = n <= i-np.maximum.accumulate((1-arr)*i) #True/False array

print(mask*1)
[0 0 0 0 0 0 0 0 1 0 0]

Visually:

arr                  [ 1  0  1  1  0  0  1  1  1  0   1  ]
i                    [ 1  2  3  4  5  6  7  8  9  10  11 ] -+
(1-arr)*i            [ 0  2  0  0  5  6  0  0  0  10  0  ]  |-> differences
maximum.accumulate   [ 0  2  2  2  5  6  6  6  6  10  10 ] -+     |
i-np.maximum...      [ 1  0  1  2  0  0  1  2  3  0   1  ] <------+

As a side benefit from this approach, you actually get all the n-consecutive values in one operation from i-np.maximum.accumulate((1-arr)*i) so you can store them and check for different values of n without redoing the calculations.

edited Nov 09 '21 at 13:29

answered Nov 08 '21 at 16:02

Alain T.

40,517
4
31
51

Thank you for your answer @alain-t. How would you handle a situation where there *was* an n-consec at the beginning? – Craig Nathan Nov 08 '21 at 18:17
1

It should work as-is. Setting arr[0]=0 is merely a base condition. Every additional assignments after that will filter all the n-1, n-2, ... positions but will properly maintain the nth '1' even if the array starts with a series of 1s. – Alain T. Nov 08 '21 at 18:20
Ah yes you are correct. Thanks! I will just need to run a benchmark when I get home tonight before accepting an answer – Craig Nathan Nov 08 '21 at 18:41
Thank you for the update. Would you mind clarifying how *n* would be used in your faster code? If it was provided as an input, how could your code be used to find the result for any value of *n* (where *n* >= 2)? – Craig Nathan Nov 08 '21 at 20:34
Thank you for the full vectorized solution! This is exactly the type of approach I am looking for. I really appreciate your persistence on this. I'm just going to run benchmarks on all 3 answers in a couple hours. I'll let you know the results. – Craig Nathan Nov 08 '21 at 21:45
I like the last approach (+1), it's the kind of things that I would do with pandas for grouping ;) – mozway Nov 09 '21 at 08:25
The visualization is a nice addition to view your though process (+1). Quite the artful solution you came up with here! – Craig Nathan Nov 09 '21 at 10:12

score 1 · Answer 2 · answered Nov 08 '21 at 16:04

1

Numpy only:

from numpy.lib.stride_tricks import sliding_window_view

arr = np.array([1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
res = sliding_window_view(arr , window_shape = 3).all(axis=1) # window_shape <- n
>>> array([False, False, False, False, False, False,  True, False, False])

answered Nov 08 '21 at 16:04

kubatucka

555
5
15

The result is arr - window +1 in length. – kubatucka Nov 08 '21 at 16:06
@kabutucka thank you for your help. In my case, the result array needs to maintain the same shape as the initial array. Is that possible using this approach? – Craig Nathan Nov 08 '21 at 18:18
1

@Craig you should be able to pad kubatucka's solution using `np.pad` like I use in my answer: `np.pad(sliding_window_view(arr, window_shape=n).all(axis=1), (n-1,0), constant_values=0)` – mozway Nov 08 '21 at 20:01
Okay thanks! I will try that out – Craig Nathan Nov 08 '21 at 20:36

mozway · Answer 3 · 2021-11-08T18:28:17.520

1

You can pad/shift the sequence using numpy.pad and compare with the previous value, finally multiply by the original to keep only the 1s:

b = np.pad(arr, (1,0), constant_values=0)[:-1]
# b: array([0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0])
(arr==b)*arr

output: array([0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0])

generic solution:

n = 3
(sum([np.pad(arr, (i, 0), constant_values=0)[:len(arr)]
      for i in range(n)]) == n)*arr

output: array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0])

NB. the trick here is to sum the shifted values, if the n previous values are 1, the sum is n

edited Nov 08 '21 at 18:28

answered Nov 08 '21 at 16:33

mozway

194,879
13
39
75

Thanks @mozway. Which value in line 1 represents *n*? – Craig Nathan Nov 08 '21 at 18:20
1

@CraigNathan you need to repeat the process, I added an update ;) – mozway Nov 08 '21 at 18:28
Okay thank you! I will benchmark when I get home this evening – Craig Nathan Nov 08 '21 at 18:41

Python/Numpy/Boolean Indexing: Modify boolean array at locations with N consecutive True values

3 Answers3

generic solution: