Average Number of Repeating Patterns with Numpy

Question

I have an arbitrary array with only binary values- say:

a = np.array([1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,0,])

What would be the most efficient way to count average length of sequences of 1s array? Eg. In this example, it would be (1 + 8 + 2)/3.

Complexity-wise: just iterate over all values and use some active/inactive counter. Combine with a one-pass mean-calculation and you got an O(n) algorithm which is also the lower bound (because you need to look at all elements at least once). — sascha, Aug 09 '16 at 01:42
I see. Hmm, was wondering if numpy had an internal function which could do pattern counting of some sort. Thank you, though! — LogCapy, Aug 09 '16 at 01:51

score 2 · Answer 1 · edited May 23 '17 at 10:28

For an all numpy solution, you can use Alex Martelli's solution like so:

def runs_of_ones_array(bits):
    # make sure all runs of ones are well-bounded
    bounded = np.hstack(([0], bits, [0]))
    # get 1 at run starts and -1 at run ends
    difs = np.diff(bounded)
    run_starts, = np.where(difs > 0)
    run_ends, = np.where(difs < 0)
    return run_ends - run_starts

>>> a=np.array([1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,0,])
>>> b=runs_of_ones_array(a)
>>> float(sum(b))/len(b)
3.66666666667

karakfa · Accepted Answer · 2016-08-09T02:07:19.997

1

I'm not sure easiest, but one alternative is

np.mean([len(list(v)) for k,v in itertools.groupby(a) if k])

3.6666666666666665

Explanation groupby groups adjacent equal values together, we filter only ones (if k), the list comprehension [...] creates the list of the lengths of sub sequences of ones, i.e. [1,8,2], and mean computes the average value.

edited Aug 09 '16 at 02:07

answered Aug 09 '16 at 01:52

karakfa

66,216
7
41
56

This one looks really nice. Thank you! – LogCapy Aug 09 '16 at 01:54
Sorry, but do you think you could explain what is going on in this code? I have never used itertools before. – LogCapy Aug 09 '16 at 02:01

Average Number of Repeating Patterns with Numpy

2 Answers2