3

If you uncomment the commented line below, then the output will change (for all but the last key, the grouper object will be empty). Why is this?

from itertools import groupby

c = groupby(['goat', 'dog', 'cow', 1, 1, 2, 3, 11, 10, ('persons', 'man', 'woman')])
#c = list(c)
dic = {}
for k, v in c:
    dic[k] = list(v)
print dic
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
James Ko
  • 32,215
  • 30
  • 128
  • 239
  • Nice answer: https://stackoverflow.com/questions/48475888/list-around-groupby-results-in-empty-groups?rq=1 – Javier Feb 23 '18 at 00:00
  • @Javier That's a question. – Stefan Pochmann Feb 23 '18 at 00:05
  • I don't know how to reference answers. I meant the accepted answer in that question. – Javier Feb 23 '18 at 00:06
  • @Javier Baw... I was hoping you meant *my* answer there. And under each answer, you can click "Share" to get a link to it. – Stefan Pochmann Feb 23 '18 at 00:08
  • Thanks! Actually, I did: https://stackoverflow.com/a/48476719/3339058 – Javier Feb 23 '18 at 00:09
  • If you're seeking to enumerate the flattened contents of the groups as you go, then try: `dic = {grp[0]: list(grp[1]) for grp in c}`. This consumes (via `list` construction) the inner `_grouper` for each key sequentially, whereas `list(c)` consumes (and instantly discards) the interior `_grouper` values, leaving each `_grouper` empty upon later examination. – ely Feb 23 '18 at 00:12

1 Answers1

2

Summary: The reason is that itertools generally do not store data. They just consume an iterator. So when the outer iterator advances, the inner iterator must as well.

Analogy: Imagine you are a flight attendant standing at the door, admitting a single line passengers to an aircraft. The passengers are arranged by boarding group but you can only see and admit them one at a time. Periodically, as people enter you will learn when one boarding group has ended and then next has begun.

To advance to the next group, you're going to have to admit all the remaining passengers in the current group. You can't see what is downstream in line without letting all the current passengers through.

Unix comparison: The design of groupby() is algorithmically similar to the Unix uniq utility.

What the docs say: "The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible."

How to use it: If the data is needed later, it should be stored as a list:

groups = []
uniquekeys = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
    groups.append(list(g))      # Store group iterator as a list
    uniquekeys.append(k)
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485