Find number of zeros before non-zero in a numpy array

Question

I have a numpy array A. I would like to return the number of zeros before a non-zero in A in an efficient way as it is in a loop.

If A = np.array([0,1,2]) then np.nonzero(A)[0][0] returns 1. However if A = np.array([0,0,0]) this doesn't work (I would like the answer 3 in this case). And also if A is very big and the first non-zero is near the beginning this seems inefficient.

Related question: http://stackoverflow.com/questions/7632963/numpy-find-first-index-of-value-fast — YXD, Apr 23 '14 at 16:38
@shx2 Hmm.. that ticket basically stops 2 years ago and it then points to another ticket which goes silent 10 months ago. — Simd, Apr 23 '14 at 18:03

score 4 · Answer 1 · answered Apr 23 '14 at 16:37

4

By adding a nonzero number at the end of the array, you can still use np.nonzero to get your desired outcome.

A = np.array([0,1,2])
B = np.array([0,0,0])

np.min(np.nonzero(np.hstack((A, 1))))   # --> 1
np.min(np.nonzero(np.hstack((B, 1))))   # --> 3

answered Apr 23 '14 at 16:37

tanemaki

4,849
1
12
6

score 4 · Answer 2 · answered Apr 23 '14 at 18:14

4

i = np.argmax(A!=0)
if i==0 and np.all(A==0): i=len(A)

This should be the most performant solution without extensions. Also easily vectorized to act along multiple axes.

answered Apr 23 '14 at 18:14

Eelco Hoogendoorn

10,459
1
44
42

Strangely this seems a lot slower in my timings. Around 30ms. What do you get? – Simd Apr 24 '14 at 12:54
No idea; as compared to what? I would expect performance to be similar to nonzero, except that we avoid the construction of nonzero's output array. – Eelco Hoogendoorn Apr 24 '14 at 21:07
An interesting comparison would be with the code @Mr-E tests. – Simd Apr 25 '14 at 13:00

YXD · Accepted Answer · 2014-04-23T17:18:20.237

Here's an iterative Cython version, which may be your best bet if this is a serious bottleneck

# saved as file count_leading_zeros.pyx
import numpy as np
cimport numpy as np
cimport cython

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def count_leading_zeros(np.ndarray[DTYPE_t, ndim=1] a):
    cdef int elements = a.size
    cdef int i = 0
    cdef int count = 0
    while i < elements:
        if a[i] == 0:
            count += 1
        else:
            return count
        i += 1
    return count

This is similar to @mtrw's answer but with indexing at native speeds. My Cython is a bit sketchy so there may be further improvements to be made.

A quick test of an extremely favourable case with IPython with a few different methods

In [1]: import numpy as np

In [2]: import pyximport; pyximport.install()
Out[2]: (None, <pyximport.pyximport.PyxImporter at 0x53e9250>)

In [3]: import count_leading_zeros

In [4]: %paste
def count_leading_zeros_python(x):
    ctr = 0
    for k in x:
        if k == 0:
            ctr += 1
        else:
            return ctr
    return ctr
## -- End pasted text --
In [5]: a = np.zeros((10000000,), dtype=np.int)

In [6]: a[5] = 1

In [7]: 

In [7]: %timeit np.min(np.nonzero(np.hstack((a, 1))))
10 loops, best of 3: 91.1 ms per loop

In [8]: 

In [8]: %timeit np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
10 loops, best of 3: 107 ms per loop

In [9]: 

In [9]: %timeit count_leading_zeros_python(a)
100000 loops, best of 3: 3.87 µs per loop

In [10]: 

In [10]: %timeit count_leading_zeros.count_leading_zeros(a)
1000000 loops, best of 3: 489 ns per loop

However I'd only use something like this if I had evidence (with a profiler) that this was a bottleneck. Many things may seem inefficient but are never worth your time to fix.

+1 for testing. Interesting that even with so few iterations, the Python loop is ~10x slower than the Cython one. If the non-zero element was later in the large array, the Cython advantage would be even greater. — mtrw, Apr 23 '14 at 21:21

score 1 · Answer 4 · answered Apr 23 '14 at 16:03

What's wrong with the naive approach:

def countLeadingZeros(x):
""" Count number of elements up to the first non-zero element, return that count """
    ctr = 0
    for k in x:
        if k == 0:
            ctr += 1
        else: #short circuit evaluation, we found a non-zero so return immediately
            return ctr
    return ctr #we get here in the case that x was all zeros

This returns as soon as a non-zero element is found, so it is O(n) in the worst case. You could make it faster by porting it to C, but it would be worth testing to see if that is really necessary for the arrays you're working with.

Anoop · Answer 5 · 2014-04-23T17:05:36.760

I am surprised why nobody has not used np.where yet

np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0 else np.shape(a)[0] will do the trick

>> a = np.array([0,1,2])
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 1
>> a = np.array([0,0,0))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 3
>> a = np.array([1,2,3))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 0

score -2 · Answer 6 · edited Sep 08 '15 at 13:26

-2

If you don't care about the speed, I have a small trick to do the job:

a = np.array([0,0,1,1,1])
t = np.where(a==0,1,0)+np.append(np.where(a==0,0,1),0)[1:]
print t
[1 2 1 1 0]
np.where(t==2)
(array([1]),)

edited Sep 08 '15 at 13:26

the

21,007
11
68
101

answered Sep 08 '15 at 12:49

Raymond Coulson

1
1

Find number of zeros before non-zero in a numpy array

6 Answers6