0

I have an arbitrary array with only binary values- say:

a = np.array([1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,0,])

What would be the most efficient way to count average length of sequences of 1s array? Eg. In this example, it would be (1 + 8 + 2)/3.

LogCapy
  • 447
  • 7
  • 20
  • 2
    Complexity-wise: just iterate over all values and use some active/inactive counter. Combine with a one-pass mean-calculation and you got an O(n) algorithm which is also the lower bound (because you need to look at all elements at least once). – sascha Aug 09 '16 at 01:42
  • I see. Hmm, was wondering if numpy had an internal function which could do pattern counting of some sort. Thank you, though! – LogCapy Aug 09 '16 at 01:51

2 Answers2

2

For an all numpy solution, you can use Alex Martelli's solution like so:

def runs_of_ones_array(bits):
    # make sure all runs of ones are well-bounded
    bounded = np.hstack(([0], bits, [0]))
    # get 1 at run starts and -1 at run ends
    difs = np.diff(bounded)
    run_starts, = np.where(difs > 0)
    run_ends, = np.where(difs < 0)
    return run_ends - run_starts

>>> a=np.array([1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,0,])
>>> b=runs_of_ones_array(a)
>>> float(sum(b))/len(b)
3.66666666667
Community
  • 1
  • 1
dawg
  • 98,345
  • 23
  • 131
  • 206
1

I'm not sure easiest, but one alternative is

np.mean([len(list(v)) for k,v in itertools.groupby(a) if k])

3.6666666666666665

Explanation groupby groups adjacent equal values together, we filter only ones (if k), the list comprehension [...] creates the list of the lengths of sub sequences of ones, i.e. [1,8,2], and mean computes the average value.

karakfa
  • 66,216
  • 7
  • 41
  • 56