Count number of clusters of non-zero values in Python?

Question

My data looks something like this:

a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]

Essentially, there's a bunch of zeroes before non-zero numbers and I am looking to count the number of groups of non-zero numbers separated by zeros. In the example data above, there are 3 groups of non-zero data so the code should return 3.

Number of zeros between groups of non-zeros is variable

Any good ways to do this in python? (Also using Pandas and Numpy to help parse the data)

if you had in a series (or dataframe), could do: `((ser!=0)&(ser.shift()==0)).sum()` — JohnE, Jan 02 '17 at 06:41
Related: [Extract separate non-zero blocks from array](http://stackoverflow.com/questions/31544129/extract-separate-non-zero-blocks-from-array) and [How to slice list into contiguous groups of non-zero integers in Python](http://stackoverflow.com/questions/6760871/how-to-slice-list-into-contiguous-groups-of-non-zero-integers-in-python) — user2314737, Jan 10 '17 at 23:19

Divakar · Accepted Answer · 2017-01-01T07:36:37.433

With a as the input array, we could have a vectorized solution -

m = a!=0
out = (m[1:] > m[:-1]).sum() + m[0]

Alternatively for performance, we might use np.count_nonzero which is very efficient to count bools as is the case here, like so -

out = np.count_nonzero(m[1:] > m[:-1]) + m[0]

Basically, we get a mask of non-zeros and count rising edges. To account for the first element that could be non-zero too and would not have any rising edge, we need to check it and add to the total sum.

Also, please note that if input a is a list, we need to use m = np.asarray(a)!=0 instead.

Sample runs for three cases -

In [92]: a  # Case1 :Given sample
Out[92]: 
array([ 0,  0,  0,  0,  0,  0, 10, 15, 16, 12, 11,  9, 10,  0,  0,  0,  0,
        0,  6,  9,  3,  7,  5,  4,  0,  0,  0,  0,  0,  0,  4,  3,  9,  7,
        1])

In [93]: m = a!=0

In [94]: (m[1:] > m[:-1]).sum() + m[0]
Out[94]: 3

In [95]: a[0] = 7  # Case2 :Add a non-zero elem/group at the start

In [96]: m = a!=0

In [97]: (m[1:] > m[:-1]).sum() + m[0]
Out[97]: 4

In [99]: a[-2:] = [0,4] # Case3 :Add a non-zero group at the end

In [100]: m = a!=0

In [101]: (m[1:] > m[:-1]).sum() + m[0]
Out[101]: 5

Moinuddin Quadri · Answer 2 · 2016-12-31T22:46:31.060

4

You may achieve it via using itertools.groupby() with list comprehension expression as:

>>> from itertools import groupby

>>> len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
3

edited Dec 31 '16 at 22:46

answered Dec 31 '16 at 22:06

Moinuddin Quadri

46,825
13
96
126

This is good if using an iterable that is not easily convertable to a Numpy array. – David Z Dec 31 '16 at 22:29
Nah, I think OP won't have such a condition. – Divakar Dec 31 '16 at 22:48
Oh, I meant the general `groupby` approach - I actually didn't even notice your mistake. Anyway my point is that, since the question asks about a situation where Numpy is available, the vectorized solution is better, but this is a good one to know about for other situations. – David Z Dec 31 '16 at 22:48

score 3 · Answer 3 · answered Dec 31 '16 at 22:06

simple python solution, just count changes from 0 to non-zero, by keeping track of the previous value (rising edge detection):

a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]

previous = 0
count = 0
for c in a:
    if previous==0 and c!=0:
        count+=1
    previous = c

print(count)  # 3

piRSquared · Answer 4 · 2017-01-01T00:21:01.863

pad array with a zero on both sides with np.concatenate
find where zero with a == 0
find boundaries with np.diff
sum up boundaries found with sum
divide by two because we will have found twice as many as we want

def nonzero_clusters(a):
    return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)

demonstration

nonzero_clusters(
    [0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
)

3

nonzero_clusters([0, 1, 2, 0, 1, 2])

2

nonzero_clusters([0, 1, 2, 0, 1, 2, 0])

2

nonzero_clusters([1, 2, 0, 1, 2, 0, 1, 2])

3

timing
a = np.random.choice((0, 1), 100000)
code

from itertools import groupby

def div(a):
    m = a != 0
    return (m[1:] > m[:-1]).sum() + m[0]

def pir(a):
    return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)

def jean(a):
    previous = 0
    count = 0
    for c in a:
        if previous==0 and c!=0:
            count+=1
        previous = c
    return count

def moin(a):
    return len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])

def user(a):
    return sum([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

score 1 · Answer 5 · answered Dec 31 '16 at 22:14

1

sum ([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

answered Dec 31 '16 at 22:14

user7342539

222
2
8

What if the first element is non-zero? – Divakar Dec 31 '16 at 22:37
@Divakar `Essentially, there's a bunch of zeroes before non-zero numbers` That's what the OP said. – user7342539 Dec 31 '16 at 22:41

Count number of clusters of non-zero values in Python?

5 Answers5