I have a smallish 1D NumPy array with length of order 100...
I want to find the number of times a sub-array occurs. Assume that the array has either 1 or 0 as each element. I want to count the instance where at least 3 0's occur is a row.
For np.array([0,0,0,0,1,0,0,1,0,0,0])
I would like to return 2
For np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0])
I would like to return 2
For np.array([0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0])
I would like to return 5
I have tried casting to string and using string.count(). This works very well, but I need a solution that is a lot quicker. I am hitting this function millions of times a minute.
At present, I am looping over the array, which is slow, but surprisingly far quicker (4X) than casting to string (I know casting it slow and string manipulation slower....)
Any ideas would be appreciated.
In terms of the itertools suggestion:
I wrote a small check in terms of performance:
import numpy as np
import itertools
import time
def itertools_solution(full_array):
trueFalse = full_array == 0
count = [ sum( 1 for _ in group ) for key, group in itertools.groupby( trueFalse ) if key ]
above = [val for val in count if val >= 3]
return len(above)
def looping_solution(full_array):
total_count = 0
running_count = 0
for counter, val in enumerate(full_array):
if val == 0:
running_count += 1
if running_count == 3:
total_count += 1
else:
running_count = 0
return total_count
a = np.array([[0,0,0,0,2,0,0,5,0,0,0,5,5,5],
[0,0,0,0,0,1,1,0,1,4,0,0,4,4],
[0,0,1,1,0,0,4,4,4,0,4,0,0,1],
[3,2,2,3,3,0,0,3,2,6,6,6,0,0],
[0,1,4,5,0,4,0,0,0,5,0,2,1,0],
[0,0,3,6,6,6,0,0,0,2,2,3,3,6],
[2,0,0,2,5,5,5,0,0,0,5,0,0,0],
[1,3,0,0,1,3,3,6,6,0,0,4,6,0],
[5,5,5,0,0,2,2,2,5,0,0,0,2,2],
[6,6,6,0,0,0,6,0,3,3,3,0,0,3],
[4,4,0,4,4,0,0,1,0,1,1,1,0,0]]).flatten()
time_start = time.time()
for cnt in range(1000):
itertools_solution(a)
print('itertools took %f seconds' % (time.time() - time_start))
time_start = time.time()
for cnt in range(1000):
looping_solution(a)
print('looping took %f seconds' % (time.time() - time_start))
with results:
itertools took 0.185000 seconds looping took 0.038001 seconds
As well as this works, it unfortunately does not address my performance issue...