Splitting a list of arbitrary size into N-not-equal parts

Question

I see splitting-a-list-of-arbitrary-size-into-only-roughly-n-equal-parts. How about not-equal splitting? I have list having items with some attribute (value which can be retrieved for running same function against every item), how to split items having same attribute to be new list e.g. new sublist? Something lambda-related could work here?

Simple example could be:

list = [1, 1, 1, 2, 3, 3, 3, 3, 4, 4]

After fancy operation we could have:

list = [[1, 1, 1], [2], [3, 3, 3, 3], [4, 4]]

@LutzHorn I've numeric property for every object which can be retrieved via function. It classifies objects belonging to some particular group. Every object belongs to only and only one group. There can be as many groups as there are items on list. — Katve, Jun 10 '14 at 07:55

score 2 · Accepted Answer · answered Jun 10 '14 at 07:47

>>> L = [1, 1, 1, 2, 3, 3, 3, 3, 4, 4]
>>> [list(g) for i, g in itertools.groupby(L)]
[[1, 1, 1], [2], [3, 3, 3, 3], [4, 4]]
>>> L2 = ['apple', 'aardvark', 'banana', 'coconut', 'crow']
>>> [list(g) for i, g in itertools.groupby(L2, operator.itemgetter(0))]
[['apple', 'aardvark'], ['banana'], ['coconut', 'crow']]

score 0 · Answer 2 · answered Jun 10 '14 at 07:53

You should use the itertools.groupby function from the standard library.

This function groups the elements in the iterable it receives (by default using the identity function, i.e., checking consequent elements for equality), and for each streak of grouped elements, it reutrns a 2-tuple consisting of the streak representative (the element itself), and an iterator of the elements within the streak.

Indeed:

l = [1, 1, 1, 2, 3, 3, 3, 3, 4, 4]

list(list(k[1]) for k in groupby(l))
>>> [[1, 1, 1], [2], [3, 3, 3, 3], [4, 4]]

P.S. you should avoid using list as a variable name, as it would conflict with the built-in type/function.

score 0 · Answer 3 · answered Jun 10 '14 at 07:55

Here's a pretty simple roll your own solution. If the 'attribute' in question is simply the value of the item, there are more straightforward approaches.

def split_into_sublists(data_list, sizes_list):
    if sum(sizes_list) != len(data_list):
        raise ValueError

    count = 0
    output = []
    for size in sizes_list:
        output.append(data_list[count:count+size])
        count += size
    return output


if __name__ == '__main__':
    data_list = [1, 1, 1, 2, 3, 3, 3, 3, 4, 4]
    sizes_list = [3,1,4,2]
    list2 = [[1, 1, 1], [2], [3, 3, 3, 3], [4, 4]]
    print(split_into_sublists(data_list, sizes_list) == list2) # True

Splitting a list of arbitrary size into N-not-equal parts

3 Answers3