Why doesn't itertools groupby work as expected?

Question

I am trying to separate a list of integers into odd and even groups.

>>> from itertools import groupby
>>> L = [1,2,3,4]
>>> grouped = [list(g) for k,g in groupby(L, key=lambda x: x % 2)]
>>> grouped
[[1], [2], [3], [4]]

Clearly, not what I want. However, if I sort L first, using the same lambda key, it works as intended:

>>> L.sort(key=lambda x: x % 2)
>>> L
[2, 4, 1, 3]
>>> grouped = [list(g) for k,g in groupby(L, key=lambda x: x % 2)]
>>> grouped
[[2, 4], [1, 3]]

I don't understand why presorting is necessary. It seems groupby should iterate through the integers in list, assign each a value based on key function, and then group them -- regardless of the list order.

Please read the [documentation](https://docs.python.org/3/library/itertools.html#itertools.groupby) first, so you know what the functions you're using are actually supposed to do. — user2357112, Jan 30 '18 at 17:48
It is quite clearly explained in the docs: https://docs.python.org/2/library/itertools.html#itertools.groupby — Woody Pride, Jan 30 '18 at 17:49
"it seems" why does it seem? Did you read the documentation? — juanpa.arrivillaga, Jan 30 '18 at 17:51
You are supposed to sort the iterable by that key first, then group it! like: `L = sorted(L,key=lambda x:x%2)` — Ubdus Samad, Jan 30 '18 at 17:53
Got it. I had read the docs previously and thought I understood, but clearly I did not. Here is the salient point: "It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function)." I should have re-read this before posting my question. Sorry. — Steve, Jan 30 '18 at 17:59
@Steve note, the unsorted behavior is *very frequently* what you want, which is why the behavior exists. Sorting is quite fast in Python (especially in relation to the language itself), so if you do want to sort it, you shouldn't feel a compunction about it. If you absolutely cannot afford O(n*logn), and need O(n), then there are dictionary-based O(n) grouping options. — juanpa.arrivillaga, Jan 30 '18 at 19:33
@Steve Your question as it stands is a duplicate, but feel free to open a new question posting your current attempt and asking for the best approach to solving it in general — jamylak, Feb 10 '18 at 20:28

pylang · Answer 1 · 2018-07-03T08:55:34.040

itertools.groupby will group consecutive items that satisfy the key function. See the (key, group) pairs below. Notice at the end what happens to the adjacent even items:

>>> from itertools import groupby


>>> lst = [1, 2, 3, 4, 6, 8]
>>> grouped = [(k, list(g)) for k, g in groupby(lst, key=lambda x: x % 2)]
>>> grouped
[(1, [1]), (0, [2]), (1, [3]), (0, [4, 6, 8])]

Here are some ways to get groups of evens and odds:

>>> lst = [1, 2, 3, 4]

# OK, post-process groupby iterator
>>> grouped = [(k, list(g)) for k, g in groupby(lst, key=lambda x: x % 2)]
>>> evens = [x[1][0] for x in grouped if not x[0]]
>>> odds = [x[1][0] for x in grouped if x[0]]
>>> evens, odds
([2, 4], [1, 3])

# Better, pre-"sort"/rearrange iterable then groupby (see comments)
>>> key = lambda x: x % 2
>>> rearranged = sorted(lst, key=key)
>>> evens, odds = [(list(g)) for k, g in groupby(rearranged, key=key)]
>>> evens, odds
([2, 4], [1, 3])

# Even Better, simple list comprehensions
>>> evens, odds = [x for x in lst if not x % 2], [x for x in lst if x % 2]
>>> evens, odds
([2, 4], [1, 3])

See docs and this post for more on itertools.groupby.

Alternatives

For complex groupings, you may map a function and collect values in a defaultdict.

>>> import collections as ct


>>> def even_odd(elem):
...     key = "odd" if elem % 2 else "even"
...     return key, elem

>>> dd = ct.defaultdict(list)
>>> for k, v in map(even_odd, range(1, 5)):
...     dd[k].append(v)  
>>> dd
defaultdict(list, {'even': [2, 4], 'odd': [1, 3]})

score 1 · Answer 2 · answered Feb 10 '18 at 19:32

Got it. I had read the docs previously and thought I understood, but clearly I did not. Here is the salient point:

"It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function)." [Emphasis mine.]

I should have re-read this before posting my question. Sorry.

Why doesn't itertools groupby work as expected?

2 Answers2