Get groups of consecutive elements of a NumPy array based on condition

Question

I have a NumPy array as follows:

import numpy as np
a = np.array([1, 4, 2, 6, 4, 4, 6, 2, 7, 6, 2, 8, 9, 3, 6, 3, 4, 4, 5, 8])

and a constant number b = 6

Based on a previous question I can count the number c which is defined by the number of times the elements in a are less than b 2 or more times consecutively.

from itertools import groupby
b = 6
sum(len(list(g))>=2 for i, g in groupby(a < b) if i)

so in this example c == 3

Now I would like to output an array each time the condition is met instead of counting the number of times the condition is met.

So with this example the right output would be:

array1 = [1, 4, 2]
array2 = [4, 4]
array3 = [3, 4, 4, 5]

since:

1, 4, 2, 6, 4, 4, 6, 2, 7, 6, 2, 8, 9, 3, 6, 3, 4, 4, 5, 8  # numbers in a
1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0  # (a<b)
^^^^^^^-----^^^^-----------------------------^^^^^^^^^^---  # (a<b) 2+ times consecutively
   1         2                                    3

So far I have tried different options:

np.isin((len(list(g))>=2 for i, g in groupby(a < b)if i), a)

and

np.extract((len(list(g))>=2 for i, g in groupby(a < b)if i), a)

But none of them achieved what I am searching for. Can someone point me to the right Python tools in order to output the different arrays satisfying my condition?

Georgy · Accepted Answer · 2019-07-05T13:58:16.687

While measuring performance of my other answer I noticed that while it was faster than Austin's solution (for arrays of length <15000), its complexity was not linear.

Based on this answer I came up with the following solution using np.split which is more efficent than both previously added answers here:

array = np.append(a, -np.inf)  # padding so we don't lose last element
mask = array >= 6  # values to be removed
split_indices = np.where(mask)[0]
for subarray in np.split(array, split_indices + 1):
    if len(subarray) > 2:
        print(subarray[:-1])

gives:

[1. 4. 2.]
[4. 4.]
[3. 4. 4. 5.]

Performance*:

^{*Measured by perfplot}

Austin · Answer 2 · 2019-07-04T12:55:08.830

1

Use groupby and grab the groups:

from itertools import groupby

lst = []
b = 6
for i, g in groupby(a, key=lambda x: x < b):
    grp = list(g)
    if i and len(grp) >= 2:
        lst.append(grp)

print(lst)

# [[1, 4, 2], [4, 4], [3, 4, 4, 5]]

edited Jul 04 '19 at 12:55

answered Jul 04 '19 at 12:49

Austin

25,759
4
25
48

score 1 · Answer 3 · answered Jul 04 '19 at 14:21

This task is very similar to image labeling, but, in your case, it is one-dimensional. SciPy library provides some useful functionality for image processing that we could employ here:

import numpy as np
from scipy.ndimage import (binary_dilation,
                           binary_erosion,
                           label)

a = np.array([1, 4, 2, 6, 4, 4, 6, 2, 7, 6, 2, 8, 9, 3, 6, 3, 4, 4, 5, 8])
b = 6  # your threshold
min_consequent_count = 2

mask = a < b
structure = [False] + [True] * min_consequent_count  # used for erosion and dilation
eroded = binary_erosion(mask, structure)
dilated = binary_dilation(eroded, structure)
labeled_array, labels_count = label(dilated)  # labels_count == c

for label_number in range(1, labels_count + 1):  # labeling starts from 1
    subarray = a[labeled_array == label_number]
    print(subarray)

gives:

[1 4 2]
[4 4]
[3 4 4 5]

Explanation:

mask = a < b returns a boolean array with True values where elements are less than the threshold b:

array([ True,  True,  True, False,  True,  True, False,  True, False,
       False,  True, False, False,  True, False,  True,  True,  True,
        True, False])

As you can see the result contains some True elements that don't have any other True neighbors around them. To eliminate them we could use binary erosion. I use scipy.ndimage.binary_erosion for that purpose. Its default structure parameter is not suitable for our needs as it will also delete two consequent True values, so I construct my own:
```
>>> structure = [False] + [True] * min_consequent_count
>>> structure
[False, True, True]
>>> eroded = binary_erosion(mask, structure)
>>> eroded
array([ True,  True, False, False,  True, False, False, False, False,
       False, False, False, False, False, False,  True,  True,  True,
       False, False])
```

We managed to remove single True values but we need to get the initial configuration for other groups. In order to do so, we use binary dilation with the same structure:

>>> dilated = binary_dilation(eroded, structure)
>>> dilated
array([ True,  True,  True, False,  True,  True, False, False, False,
       False, False, False, False, False, False,  True,  True,  True,
        True, False])

^{Docs for binary_dilation: link.}

And as a final step, we label each group with scipy.ndimage.label:

>>> labeled_array, labels_count = label(dilated)
>>> labeled_array
array([1, 1, 1, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 0])
>>> labels_count
3

You can see that labels_count is the same as the c value - number of the groups in the question. From here you can simply get the subgroups by boolean indexing:

>>> a[labeled_array == 1]
array([1, 4, 2])
>>> a[labeled_array == 3]
array([3, 4, 4, 5])

Get groups of consecutive elements of a NumPy array based on condition

3 Answers3

Linked