Counting zeros in a rolling - numpy array (including NaNs)

Question

I am trying to find a way of Counting zeros in a rolling using numpy array ?

Using pandas I can get it using:

df['demand'].apply(lambda x: (x == 0).rolling(7).sum()).fillna(0))

or

df['demand'].transform(lambda x: x.rolling(7).apply(lambda x: 7 - np.count _nonzero(x))).fillna(0)

In numpy, using the code from Here

def rolling_window(a, window_size):
    shape = (a.shape[0] - window_size + 1, window_size) + a.shape[1:]
    print(shape)
    strides = (a.strides[0],) + a.strides
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

arr = np.asarray([10, 20, 30, 5, 6, 0, 0, 0])

np.count_nonzero(rolling_window(arr==0, 7), axis=1)

Output:
    array([2, 3])

However, I need the first 6 NaNs as well, and fill it with zeros:

Expected output:

array([0, 0, 0, 0, 0, 0, 2, 3])

Divakar · Accepted Answer · 2020-05-29T20:05:16.810

Think an efficient one would be with 1D convolution -

def sum_occurences_windowed(arr, W):
    K = np.ones(W, dtype=int)
    out = np.convolve(arr==0,K)[:len(arr)]
    out[:W-1] = 0
    return out

Sample run -

In [42]: arr
Out[42]: array([10, 20, 30,  5,  6,  0,  0,  0])

In [43]: sum_occurences_windowed(arr,W=7)
Out[43]: array([0, 0, 0, 0, 0, 0, 2, 3])

Timings on varying length arrays and window of 7

Including count_rolling from @Quang Hoang's post.

Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.

import benchit
funcs = [sum_occurences_windowed, count_rolling]
in_ = {n:(np.random.randint(0,5,(n)),7) for n in [10,20,50,100,200,500,1000,2000,5000]}
t = benchit.timings(funcs, in_, multivar=True, input_name='Length')
t.plot(logx=True, save='timings.png')

Extending to generic n-dim arrays

from scipy.ndimage.filters import convolve1d

def sum_occurences_windowed_ndim(arr, W, axis=-1):
    K = np.ones(W, dtype=int)
    out = convolve1d((arr==0).astype(int),K,axis=axis,origin=-(W//2))
    out.swapaxes(axis,0)[:W-1] = 0
    return out

So, on a 2D array, for counting along each row, use axis=1 and for cols, axis=0 and so on.

Sample run -

In [155]: np.random.seed(0)

In [156]: a = np.random.randint(0,3,(3,10))

In [157]: a
Out[157]: 
array([[0, 1, 0, 1, 1, 2, 0, 2, 0, 0],
       [0, 2, 1, 2, 2, 0, 1, 1, 1, 1],
       [0, 1, 0, 0, 1, 2, 0, 2, 0, 1]])

In [158]: sum_occurences_windowed_ndim(a, W=7)
Out[158]: 
array([[0, 0, 0, 0, 0, 0, 3, 2, 3, 3],
       [0, 0, 0, 0, 0, 0, 2, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 4, 3, 4, 3]])

# Verify with earlier 1D solution
In [159]: np.vstack([sum_occurences_windowed(i,7) for i in a])
Out[159]: 
array([[0, 0, 0, 0, 0, 0, 3, 2, 3, 3],
       [0, 0, 0, 0, 0, 0, 2, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 4, 3, 4, 3]])

Let's test out our original 1D input array -

In [187]: arr
Out[187]: array([10, 20, 30,  5,  6,  0,  0,  0])

In [188]: sum_occurences_windowed_ndim(arr, W=7)
Out[188]: array([0, 0, 0, 0, 0, 0, 2, 3])

Excellent use of convolution here. Although some modification needed for it to work on 2D array as OP's commented under my solution. — Quang Hoang, May 29 '20 at 19:39
is there any difference if using `correlate` instead of `convolve`? — Andy L., May 29 '20 at 19:58
@AndyL. Think your question is more of if correlation is same as convolution? For the purpose of counting, its a kernel of all 1s, should be same theoretically. But convolution seems like the go-to way for these counting purposes, etc. — Divakar, May 29 '20 at 20:12
yeah, I know the differences between `convolve` and `correlate`. In this case with kernel of all `1`, it seems `correlate` more ideal since it doesn't reverse the kernel. However, I have seen `convolve` use more frequent even in cases using `correlate` more natural as in this case. That's why I ask. Thanks for answer :) +1 — Andy L., May 29 '20 at 21:35

score 1 · Answer 2 · answered May 29 '20 at 18:59

1

I would modify the function as follow:

def count_rolling(a, window_size):
    shape = (a.shape[0] - window_size + 1, window_size) + a.shape[1:]

    strides = (a.strides[0],) + a.strides
    rolling = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

    out = np.zeros_like(a)
    out[window_size-1:] = (rolling == 0).sum(1)
    return out

arr = np.asarray([10, 20, 30, 5, 6, 0, 0, 0])
count_rolling(arr,7)

Output:

array([0, 0, 0, 0, 0, 0, 2, 3])

answered May 29 '20 at 18:59

Quang Hoang

146,074
10
56
74

Thank you very much @Quang Hoang. Just I quick question: I also tried with a large array shape(100, 8) array, and it returned an output, but the last results a bit strange. In order to apply this function into a large dimension array, do I have to modify the axis=1, in some place ? – William May 29 '20 at 19:21
1

@William If you are counting for each row, you need some transposing back and forth : `count_rolling(arr.T,7).T`, unless Quang Hoang decides to roll out another one specific to that. – Divakar May 29 '20 at 20:15

Counting zeros in a rolling - numpy array (including NaNs)

2 Answers2