Finding indices of subsequences of same values in a list ordered by length

Question

There is this similar question, but not quite what I am asking.

Let's say I have a list of ones and zeroes:

# i.e. [1, 0, 0, 0, 1, 1, 1, 1, 0, 1]
sample = np.random.randint(0, 2, (10,)).tolist()

I am trying to find the index of subsequences of the same value, sorted by their length. So here, we would have the following sublists:

[1, 1, 1, 1]
[0, 0, 0]
[1]
[0]
[1]

So their indices would be [4, 1, 0, 8, 9].

I can get the sorted subsequences doing this:

sorted([list(l) for n, l in itertools.groupby(sample)], key=lambda l: -len(l))

However, if I get repeated subsequences I won't be able to find the indices right away (I would have to use another loop).

I feel like there is a more straightforward and Pythonic way of doing what I'm after, just like the answer to the previous questions suggests. This is what I'm looking for.

Can you show what the output is supposed to look like? – Willem Van Onsem Mar 27 '17 at 10:37 — Willem Van Onsem, Mar 27 '17 at 10:37
I did just that: `[4, 1, 0, 8, 9]`. – dabadaba Mar 27 '17 at 10:37 — dabadaba, Mar 27 '17 at 10:37

Willem Van Onsem · Accepted Answer · 2017-03-27T10:59:37.810

You can first create tuples of indices and values with enumerate(..). Next you groupby but on the second element of the tuple, and finally you map them back on the second index. Like:

map(lambda x:x[0][0], # obtain the index of the first element
    sorted([list(l) for _,l in itertools.groupby(enumerate(sample), # create tuples with their indices
                                                 key=lambda x:x[1])], # group in value, not on index
           key=lambda l: -len(l)))

When running (the compressed command) in the console, it produces:

>>> map(lambda x:x[0][0],sorted([list(l) for _,l in itertools.groupby(enumerate(sample),key=lambda x:x[1])],key=lambda l: -len(l)))
[4, 1, 0, 8, 9]

N.B. 1: instead of using lambda l: -len(l) as key when you sort, you can use reverse=True (and key = len), which is more declarative, like:
map(lambda x:x[0][0],
    sorted([list(l) for _,l in itertools.groupby(enumerate(sample),
                                                 key=lambda x:x[1])],
           key=len, reverse=True))
N.B. 2: In python-3.x map will produce an iterator and not a list. You can materialize the result by calling list(..) on the result.

Although this question is tagged Python 2, it's probably worth mentioning that `map` returns an iterator in Python 3, not a list, so you need to wrap that in a `list()` call. Or just use a list comp instead of `map`, that also has a benefit of using indexing directly instead of calling a function to do the indexing for each item. — PM 2Ring, Mar 27 '17 at 10:57

styvane · Answer 2 · 2017-03-27T11:42:04.067

You can use a groupby the sorted function with generator function to do this efficiently.

from itertools import groupby
from operator import itemgetter

data = [1, 0, 0, 0, 1, 1, 1, 1, 0, 1]

def gen(items):
    for _, elements in groupby(enumerate(items)):
        indexes, values = zip(*elements)
        yield indexes[0], values        

result = sorted(list(gen(data)), key=lambda x: len(x[1]), reverse=True)

Printing result yields:

[(4, (1, 1, 1, 1)), (1, (0, 0, 0)), (0, (1,)), (8, (0,)), (9, (1,))]

Finding indices of subsequences of same values in a list ordered by length

2 Answers2