1

I was trying to use itertools.groupby to help me group a list of integers by positive or negative property, for example:

input

[1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3] 

will return

[[1,2,3],[-1,-2,-3],[1,2,3],[-1,-2,-3]]

However if I:

import itertools

nums = [1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3]
group_list = list(itertools.groupby(nums, key=lambda x: x>=0))
print(group_list)
for k, v in group_list:
    print(list(v))
>>>
[]
[-3]
[]
[]

But if I don't list() the groupby object, it will work fine:

nums = [1,2,3, -1,-2,-3, 1,2,3, -1,-2,-3]
group_list = itertools.groupby(nums, key=lambda x: x>=0)
for k, v in group_list:
    print(list(v))
>>>
[1, 2, 3]
[-1, -2, -3]
[1, 2, 3]
[-1, -2, -3]

What I don't understand is, a groupby object is a iterator composed by a pair of key and _grouper object, a call of list() of a groupby object should not consume the _grouper object?

And even if it did consume, how did I get [-3] from the second element?

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
jxie0755
  • 1,682
  • 1
  • 16
  • 35
  • Just use a list comprehension: `groups = [list(g) for _, g in groupby(nums, lambda n: n >= 0)]`. – Christian Dean Feb 07 '18 at 02:49
  • https://stackoverflow.com/questions/773/how-do-i-use-pythons-itertools-groupby – BENY Feb 07 '18 at 02:51
  • @ChristianDean Hey there, it's you again! I understand how should I do it right, but I don't understand why that call on `list()` make things wrong. – jxie0755 Feb 07 '18 at 02:52
  • 1
    @Code_Control_jxie0755 Yep, I patrol the site quite frequently :-). Were you still confused after reading the answer posted below? If so, by what? – Christian Dean Feb 07 '18 at 02:54
  • @ChristianDean after reading that additional paragraph, now I understand! – jxie0755 Feb 07 '18 at 02:57

1 Answers1

5

Per the docs, it is explicitly noted that advancing the groupby object renders the previous group unusable (in practice, empty):

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list.

Basically, instead of list-ifying directly with the list constructor, you'd need a listcomp that converts from group iterators to lists before advancing the groupby object, replacing:

group_list = list(itertools.groupby(nums, key=lambda x: x>=0))

with:

group_list = [(k, list(g)) for k, g in itertools.groupby(nums, key=lambda x: x>=0)]

The design of most itertools module types is intended to avoid storing data implicitly, because they're intended to be used with potentially huge inputs. If all the groupers stored copies of all the data from the input (and the groupby object had to be sure to retroactively populate them), it would get ugly, and potentially blow memory by accident. By forcing you to make storing the values explicit, you don't accidentally store unbounded amounts of data unintentionally, per the Zen of Python:

Explicit is better than implicit.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • I understand, the second code snippet is the way you suggested to replace with. But I still don't quite get what you quote from the docs, and don't understand where that `[-3]` comes from – jxie0755 Feb 07 '18 at 02:51
  • I'm pretty sure the OP only wants the groups collected, so `groups = [list(g) for _, g in groupby(nums, lambda n: n >= 0)]` should suffice. – Christian Dean Feb 07 '18 at 02:51
  • 1
    @Code_Control_jxie0755: Every time you pull a new `key`/`group` pair from a `groupby` (advancing the `groupby` iterator), any existing `group`s are effectively emptied. `groupby` is super lazy; it's only keeping one copy of the underlying iterator and advancing it on demand, either once per iteration of the group, or skipping all remaining members of the group (if the `groupby` object itself is advanced). There is no separate state. – ShadowRanger Feb 07 '18 at 02:56
  • Thanks! "per the Zen of Python"! – jxie0755 Feb 07 '18 at 02:56
  • 1
    @Code_Control_jxie0755: The `[-3]` is an implementation quirk; when you run out the groupby, it's left in a state handling a negative group. The first negative group object you read from doesn't really know it's invalid, so it pulls the final value in the cache and says "hey, this is totally part of my group" and yields it. It probably shouldn't, so this is a minor bug in the implementation (don't rely on it), but it's not really that important; using a group iterator after advancing past it is the closest thing Python has to undefined behavior, so behaving weirdly isn't really unexpected. – ShadowRanger Feb 07 '18 at 02:59
  • @ShadowRanger Got it! You explained everything very clearly! Thanks! – jxie0755 Feb 07 '18 at 03:00
  • 1
    @Code_Control_jxie0755: Cool. If you want a clearer demonstration of where the `-3` is coming from, change `nums` to `nums = [1,2,3,-1,-2,-3,4,5,6,-4,-5,-6]`, then do `[(k, list(g)) for k, g in list(itertools.groupby(nums, key=lambda x: x>=0))]`. You'll notice the second group isn't producing `-3` now, it's producing `-6` (the last member of the fourth group). Like I said, it's essentially undefined behavior and an implementation quirk. – ShadowRanger Feb 07 '18 at 03:07
  • @ShadowRanger I see. So that `-6` is from the last element, but somehow it is still considered to be from the second group. – jxie0755 Feb 07 '18 at 03:09
  • 1
    @Code_Control_jxie0755: Yeah. The first group that is advanced with a `key` value that matches `-6` happens to be pulling the `-6` that got cached inside the `groupby` object (it needs to cache up to one value when a group ends so the data isn't lost for the next group; it should have cleared it when the iterator was exhausted, but looks like the implementation didn't do so explicitly, and it just hung around). The second group looks for all the negative values, finds the one in the cache, yields it and clears the cache, and doesn't realize its "real" group had long since expired. – ShadowRanger Feb 07 '18 at 03:15
  • @ShadowRanger This is truly interesting and educating, thanks again! – jxie0755 Feb 07 '18 at 03:23