8

I have a numpy array A. I would like to return the number of zeros before a non-zero in A in an efficient way as it is in a loop.

If A = np.array([0,1,2]) then np.nonzero(A)[0][0] returns 1. However if A = np.array([0,0,0]) this doesn't work (I would like the answer 3 in this case). And also if A is very big and the first non-zero is near the beginning this seems inefficient.

Simd
  • 19,447
  • 42
  • 136
  • 271
  • Related question: http://stackoverflow.com/questions/7632963/numpy-find-first-index-of-value-fast – YXD Apr 23 '14 at 16:38
  • 1
    Related ticket: https://github.com/numpy/numpy/issues/2269 – shx2 Apr 23 '14 at 17:06
  • @shx2 Hmm.. that ticket basically stops 2 years ago and it then points to another ticket which goes silent 10 months ago. – Simd Apr 23 '14 at 18:03

6 Answers6

4

By adding a nonzero number at the end of the array, you can still use np.nonzero to get your desired outcome.

A = np.array([0,1,2])
B = np.array([0,0,0])

np.min(np.nonzero(np.hstack((A, 1))))   # --> 1
np.min(np.nonzero(np.hstack((B, 1))))   # --> 3
tanemaki
  • 4,849
  • 1
  • 12
  • 6
4
i = np.argmax(A!=0)
if i==0 and np.all(A==0): i=len(A)

This should be the most performant solution without extensions. Also easily vectorized to act along multiple axes.

Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42
3

Here's an iterative Cython version, which may be your best bet if this is a serious bottleneck

# saved as file count_leading_zeros.pyx
import numpy as np
cimport numpy as np
cimport cython

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def count_leading_zeros(np.ndarray[DTYPE_t, ndim=1] a):
    cdef int elements = a.size
    cdef int i = 0
    cdef int count = 0
    while i < elements:
        if a[i] == 0:
            count += 1
        else:
            return count
        i += 1
    return count

This is similar to @mtrw's answer but with indexing at native speeds. My Cython is a bit sketchy so there may be further improvements to be made.

A quick test of an extremely favourable case with IPython with a few different methods

In [1]: import numpy as np

In [2]: import pyximport; pyximport.install()
Out[2]: (None, <pyximport.pyximport.PyxImporter at 0x53e9250>)

In [3]: import count_leading_zeros

In [4]: %paste
def count_leading_zeros_python(x):
    ctr = 0
    for k in x:
        if k == 0:
            ctr += 1
        else:
            return ctr
    return ctr
## -- End pasted text --
In [5]: a = np.zeros((10000000,), dtype=np.int)

In [6]: a[5] = 1

In [7]: 

In [7]: %timeit np.min(np.nonzero(np.hstack((a, 1))))
10 loops, best of 3: 91.1 ms per loop

In [8]: 

In [8]: %timeit np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
10 loops, best of 3: 107 ms per loop

In [9]: 

In [9]: %timeit count_leading_zeros_python(a)
100000 loops, best of 3: 3.87 µs per loop

In [10]: 

In [10]: %timeit count_leading_zeros.count_leading_zeros(a)
1000000 loops, best of 3: 489 ns per loop

However I'd only use something like this if I had evidence (with a profiler) that this was a bottleneck. Many things may seem inefficient but are never worth your time to fix.

YXD
  • 31,741
  • 15
  • 75
  • 115
  • +1 for testing. Interesting that even with so few iterations, the Python loop is ~10x slower than the Cython one. If the non-zero element was later in the large array, the Cython advantage would be even greater. – mtrw Apr 23 '14 at 21:21
1

What's wrong with the naive approach:

def countLeadingZeros(x):
""" Count number of elements up to the first non-zero element, return that count """
    ctr = 0
    for k in x:
        if k == 0:
            ctr += 1
        else: #short circuit evaluation, we found a non-zero so return immediately
            return ctr
    return ctr #we get here in the case that x was all zeros

This returns as soon as a non-zero element is found, so it is O(n) in the worst case. You could make it faster by porting it to C, but it would be worth testing to see if that is really necessary for the arrays you're working with.

mtrw
  • 34,200
  • 7
  • 63
  • 71
1

I am surprised why nobody has not used np.where yet

np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0 else np.shape(a)[0] will do the trick

>> a = np.array([0,1,2])
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 1
>> a = np.array([0,0,0))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 3
>> a = np.array([1,2,3))
>> np.where(a)[0][0] if np.shape(np.where(a)[0])[0] != 0  else np.shape(a)[0]
... 0
Anoop
  • 5,540
  • 7
  • 35
  • 52
-2

If you don't care about the speed, I have a small trick to do the job:

a = np.array([0,0,1,1,1])
t = np.where(a==0,1,0)+np.append(np.where(a==0,0,1),0)[1:]
print t
[1 2 1 1 0]
np.where(t==2)
(array([1]),)
the
  • 21,007
  • 11
  • 68
  • 101