23

I have the following array

a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7,0,10,11]

I would like to find the start and the end index of the array where the values are zeros consecutively. For the array above the output would be as follows

[3,8],[12,15],[19]

I want to achieve this as efficiently as possible.

jtlz2
  • 7,700
  • 9
  • 64
  • 114
Shan
  • 18,563
  • 39
  • 97
  • 132

3 Answers3

53

Here's a fairly compact vectorized implementation. I've changed the requirements a bit, so the return value is a bit more "numpythonic": it creates an array with shape (m, 2), where m is the number of "runs" of zeros. The first column is the index of the first 0 in each run, and the second is the index of the first nonzero element after the run. (This indexing pattern matches, for example, how slicing works and how the range function works.)

import numpy as np

def zero_runs(a):
    # Create an array that is 1 where a is 0, and pad each end with an extra 0.
    iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
    absdiff = np.abs(np.diff(iszero))
    # Runs start and end where absdiff is 1.
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
    return ranges

For example:

In [236]: a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7, 0, 10, 11]

In [237]: runs = zero_runs(a)

In [238]: runs
Out[238]: 
array([[ 3,  9],
       [12, 16],
       [19, 20]])

With this format, it is simple to get the number of zeros in each run:

In [239]: runs[:,1] - runs[:,0]
Out[239]: array([6, 4, 1])

It's always a good idea to check the edge cases:

In [240]: zero_runs([0,1,2])
Out[240]: array([[0, 1]])

In [241]: zero_runs([1,2,0])
Out[241]: array([[2, 3]])

In [242]: zero_runs([1,2,3])
Out[242]: array([], shape=(0, 2), dtype=int64)

In [243]: zero_runs([0,0,0])
Out[243]: array([[0, 3]])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • 1
    any way to do that with pandas ? – toine Nov 29 '15 at 18:01
  • 1
    Nice solution! Is there a reason for using `np.int8` data type in `iszero`? I think if we simply use booleans we can also avoid `np.abs()` and simply set `absdiff = np.diff(iszero)`. Am I missing something? – MikeL Jan 03 '20 at 09:57
  • This is a clever solution. I modified it to count the number of ties in the Mann-Kendall trend test https://stackoverflow.com/a/68442829/2005415 – Jason Jul 20 '21 at 00:06
  • 2
    Oneliner: `np.ediff1d(np.r_[0, a == 0, 0]).nonzero()[0].reshape(-1, 2)` – Joren Feb 21 '23 at 00:51
1

You can use itertools to achieve your expected result.

from itertools import groupby
a= [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7,0,10,11]
b = range(len(a))
for group in groupby(iter(b), lambda x: a[x]):
    if group[0]==0:
        lis=list(group[1])
        print [min(lis),max(lis)]
cs95
  • 379,657
  • 97
  • 704
  • 746
rajeshv90
  • 574
  • 1
  • 7
  • 17
  • 1
    This will return `[19, 19]`, I think OP expects just [19]. And instead of creating an unnecessary list `b`, try to use `enumerate(a).` – Ashwini Chaudhary Jul 22 '14 at 12:13
-1

Here is a custom function, not sure the most efficient but works :

def getZeroIndexes(li):
  begin = 0
  end = 0
  indexes = []
  zero = False
  for ind,elt in enumerate(li):
    if not elt and not zero:
      begin = ind
      zero = True
    if not elt and zero:
      end = ind
    if elt and zero:
      zero = False
      if begin == end:
        indexes.append(begin)
      else:
        indexes.append((begin, end))

  return indexes
Bestasttung
  • 2,388
  • 4
  • 22
  • 34