9

having an array like this for example:

[1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1]

What's the fastest way in Python to get the non-zero elements organized in a list where each element contains the indexes of blocks of continuous non-zero values?

Here the result would be a list containing many arrays:

([0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21])
user2314737
  • 27,088
  • 20
  • 102
  • 114
Mehdi
  • 1,370
  • 3
  • 15
  • 26
  • 1
    Look into to [`enumerate()`](https://docs.python.org/2/library/functions.html#enumerate) and this will become obvious. – kylieCatt Jul 21 '15 at 16:11
  • 1
    Related: [How to slice list into contiguous groups of non-zero integers in Python](http://stackoverflow.com/questions/6760871/how-to-slice-list-into-contiguous-groups-of-non-zero-integers-in-python) – user2314737 Jan 10 '17 at 23:13

4 Answers4

13
>>> L = [1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1]
>>> import itertools
>>> import operator
>>> [[i for i,value in it] for key,it in itertools.groupby(enumerate(L), key=operator.itemgetter(1)) if key != 0]

[[0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21]]
ovgolovin
  • 13,063
  • 6
  • 47
  • 78
  • Hi @ovgolovin, thanks for this answer. Could you please explain the steps in some more detail? – S.A. May 13 '20 at 11:24
  • 1
    @S.A. All the heavy lifting is done here by `itertools.groupby`. See examples in the [docs](https://docs.python.org/3.8/library/itertools.html#itertools.groupby) on how to use it. Everything else is just to adapt the data for `groupby` in the right way. First I prepended all items with index by using `enumerate`. Then grouped them by value with `groupby` with `key` parameter which selects value from `(index, value)` tuple. And then iterate through those groups, where each group is unpacked in for-loop to `(key,it)`, then filter to only groups with non-zero value `key != 0`, and output indices. – ovgolovin May 13 '20 at 23:05
  • 1
    @S.A. You can use debug prints to see intermediate values. https://ideone.com/8feFOq – ovgolovin May 13 '20 at 23:20
3

A trivial change to my answer at Finding the consecutive zeros in a numpy array gives the function find_runs:

def find_runs(value, a):
    # Create an array that is 1 where a is `value`, and pad each end with an extra 0.
    isvalue = np.concatenate(([0], np.equal(a, value).view(np.int8), [0]))
    absdiff = np.abs(np.diff(isvalue))
    # Runs start and end where absdiff is 1.
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
    return ranges

For example,

In [43]: x
Out[43]: array([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])

In [44]: find_runs(1, x)
Out[44]: 
array([[ 0,  4],
       [ 9, 12],
       [14, 16],
       [20, 22]])

In [45]: [range(*run) for run in find_runs(1, x)]
Out[45]: [[0, 1, 2, 3], [9, 10, 11], [14, 15], [20, 21]]

If the value 1 in your example was not representative, and you really want runs of any non-zero values (as suggested by the text of the question), you can change np.equal(a, value) to (a != 0) and change the arguments and comments appropriately. E.g.

def find_nonzero_runs(a):
    # Create an array that is 1 where a is nonzero, and pad each end with an extra 0.
    isnonzero = np.concatenate(([0], (np.asarray(a) != 0).view(np.int8), [0]))
    absdiff = np.abs(np.diff(isnonzero))
    # Runs start and end where absdiff is 1.
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
    return ranges

For example,

In [63]: y
Out[63]: 
array([-1,  2, 99, 99,  0,  0,  0,  0,  0, 12, 13, 14,  0,  0,  1,  1,  0,
        0,  0,  0, 42, 42])

In [64]: find_nonzero_runs(y)
Out[64]: 
array([[ 0,  4],
       [ 9, 12],
       [14, 16],
       [20, 22]])
Community
  • 1
  • 1
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
2

Have a look at scipy.ndimage.measurements.label:

import numpy as np
from scipy.ndimage.measurements import label

x = np.asarray([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])
labelled, numfeats = label(x)
indices = [np.nonzero(labelled == k) for k in np.unique(labelled)[1:]]

indices contains exactly what you asked for. Note that, depending on your ultimate goal, labelled might also give you useful (extra) information.

EelkeSpaak
  • 2,757
  • 2
  • 17
  • 37
2

You can use np.split, once you know the interval of non-zeros' lengths and the corresponding indices in A. Assuming A as the input array, the implementation would look something like this -

# Append A on either sides with zeros
A_ext = np.diff(np.hstack(([0],A,[0])))

# Find interval of non-zeros lengths
interval_lens = np.where(A_ext==-1)[0] - np.where(A_ext==1)[0]

# Indices of non-zeros places in A
idx = np.arange(A.size)[A!=0]

# Finally split indices based on the interval lengths
out = np.split(idx,interval_lens.cumsum())[:-1]

Sample input, output -

In [53]: A
Out[53]: array([1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1])

In [54]: out
Out[54]: [array([0, 1, 2, 3]), array([ 9, 10, 11]), array([14, 15]), array([20, 21])]
Divakar
  • 218,885
  • 19
  • 262
  • 358