Python: Counting instances of variable size subarray in numpy array

Question

I have a smallish 1D NumPy array with length of order 100...

I want to find the number of times a sub-array occurs. Assume that the array has either 1 or 0 as each element. I want to count the instance where at least 3 0's occur is a row.

For np.array([0,0,0,0,1,0,0,1,0,0,0]) I would like to return 2

For np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0]) I would like to return 2

For np.array([0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]) I would like to return 5

I have tried casting to string and using string.count(). This works very well, but I need a solution that is a lot quicker. I am hitting this function millions of times a minute.

At present, I am looping over the array, which is slow, but surprisingly far quicker (4X) than casting to string (I know casting it slow and string manipulation slower....)

Any ideas would be appreciated.

In terms of the itertools suggestion:

I wrote a small check in terms of performance:

import numpy as np
import itertools
import time

def itertools_solution(full_array):
  trueFalse = full_array == 0
  count = [ sum( 1 for _ in group ) for key, group in itertools.groupby( trueFalse ) if key ]
  above = [val for val in count if val >= 3]
  return len(above)

def looping_solution(full_array):
  total_count = 0
  running_count = 0
  for counter, val in enumerate(full_array):
    if val == 0:
      running_count += 1
      if running_count == 3:
        total_count += 1
    else:
      running_count = 0
  return total_count

a = np.array([[0,0,0,0,2,0,0,5,0,0,0,5,5,5],
[0,0,0,0,0,1,1,0,1,4,0,0,4,4],
[0,0,1,1,0,0,4,4,4,0,4,0,0,1],
[3,2,2,3,3,0,0,3,2,6,6,6,0,0],
[0,1,4,5,0,4,0,0,0,5,0,2,1,0],
[0,0,3,6,6,6,0,0,0,2,2,3,3,6],
[2,0,0,2,5,5,5,0,0,0,5,0,0,0],
[1,3,0,0,1,3,3,6,6,0,0,4,6,0],
[5,5,5,0,0,2,2,2,5,0,0,0,2,2],
[6,6,6,0,0,0,6,0,3,3,3,0,0,3],
[4,4,0,4,4,0,0,1,0,1,1,1,0,0]]).flatten()

time_start = time.time()
for cnt in range(1000):
  itertools_solution(a)
print('itertools took %f seconds' % (time.time() - time_start))
time_start = time.time()
for cnt in range(1000):
  looping_solution(a)
print('looping took %f seconds' % (time.time() - time_start))

with results:

itertools took 0.185000 seconds looping took 0.038001 seconds

As well as this works, it unfortunately does not address my performance issue...

Possible duplicate of [Count consecutive occurences of values varying in length in a numpy array](https://stackoverflow.com/questions/24342047/count-consecutive-occurences-of-values-varying-in-length-in-a-numpy-array) — vielkind, Jul 15 '18 at 19:00

score 1 · Accepted Answer · answered Jul 15 '18 at 21:48

We can find the position of all ones, diff them and count instances where the diff is greater than 3

>>> def zero_ranges(arr, n):
...    return np.where(np.diff(np.where(np.concatenate(([1], arr, [1]))==1)[0])>n)[0].size
...
>>> zero_ranges(np.array([0,0,0,0,1,0,0,1,0,0,0]), 3)
2
>>> zero_ranges(np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0]), 3)
2
>>> zero_ranges(np.array([0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]), 3)
5
>>>

Python: Counting instances of variable size subarray in numpy array

1 Answers1