How to group blocks of identical Booleans?

Question

Suppose I have the following list:

a = [True, True, True, False, False, False, False, True, True]

How could get them best grouped by either returning only 0,3, 7 or a grouping like the following?

[True, True, True]
[False, False, False, False]
[True, True]

Background: I am trying to find plateaus in my NumPy arrays, and while setting the derivative to zero is a good start, I still need to sort the array into chunks. I think this basically boils down to the problem above.

I looked up NumPy and itertools (trying to get a solution from the question NumPy grouping using itertools.groupby performance) but I did not succeed. I guess one might use a combination of itertools.takewhile and filtfalse (see the documentation here), but I am out of my depth there. Or maybe I am just thinking way too complicated.

`itertools.groupby` is certainly the most obvious native python way to do this, what exactly is wrong with it for you? speed, memory, something else? — Chris_Rands, Nov 12 '18 at 16:03
Skills, I just have not managed to get a solution to work with itertools.groupby — Eulenfuchswiesel, Nov 12 '18 at 16:05
`[list(g) for _,g in itertools.groupby(a)]` creates the lists... but getting the indices isn't so convient I guess, maybe `[next(g)[0] for _,g in itertools.groupby(enumerate(a), key=lambda x: x[1])]` — Chris_Rands, Nov 12 '18 at 16:05

score 6 · Accepted Answer · edited Jan 14 '21 at 06:34

6

We could get the indices with a sliced array comparison, and it should be good with performance with large size lists/arrays -

a_ext = np.r_[~a[0],a]
out = np.flatnonzero(a_ext[:-1]!=a_ext[1:])

As a one-liner, we could use np.diff + np.flatnonzero -

np.flatnonzero(np.diff(np.r_[~a[0],a]))
# compact alternative : np.where(np.diff(np.r_[~a[0],a]))[0]

edited Jan 14 '21 at 06:34

Peter Mortensen

30,738
21
105
131

answered Nov 12 '18 at 16:05

Divakar

218,885
19
262
358

@Idlehands from the question "I'm trying to find plateaus in numpy arrays". So no, I don't think so. – user3483203 Nov 12 '18 at 16:11
@Idlehands Well it's useful when looking for performance on large sized data, I have seen. So, would depend on the requirements I think. – Divakar Nov 12 '18 at 16:12
@Eulenfuchswiesel Thanks for the edit suggestions. Edited. – Divakar Nov 13 '18 at 09:23

score 3 · Answer 2 · answered Nov 12 '18 at 16:04

The simplest way maybe would be this:

a = [True, True, True, False, False, False, False, True, True]

res = [0] + [i+1 for i, (x, y) in enumerate(zip(a, a[1:])) if x!=y]
print(res)  # -> [0, 3, 7]

As far as the groupby solution goes, you could do:

from itertools import groupby

groups = [list(g) for _, g in groupby(a)]
print(groups)  # -> [[True, True, True], [False, False, False, False], [True, True]]

score 1 · Answer 3 · edited Jan 14 '21 at 06:36

You can do this completely with itertools.groupby:

Given

import itertools as it

a = [True, True, True, False, False, False, False, True, True]

Code

[list(g)[0][0] for _, g in it.groupby(enumerate(a), key=lambda x: x[-1])]
# [0, 3, 7]

Details

This is the output of groupby from your iterable:

[(k, list(g)) for k, g in it.groupby(a)]
# [(True, [True, True, True]),
#  (False, [False, False, False, False]),
#  (True, [True, True])]

We can enumerate each item per group (g) as tuples and group by the last index in each tuple:

[list(g) for k, g in it.groupby(enumerate(a), key=lambda x: x[-1])]
# [[(0, True), (1, True), (2, True)],
#  [(3, False), (4, False), (5, False), (6, False)],
#  [(7, True), (8, True)]]

Now we want the first item ([0]) and the first position ([0]) to get the index of each group.

Chris_Rands' suggestion of [next(g)[0] ...] is even cleaner.

See also this post on how to use groupby.

How to group blocks of identical Booleans?

3 Answers3