Summary: The reason is that itertools generally do not store data. They just consume an iterator. So when the outer iterator advances, the inner iterator must as well.
Analogy: Imagine you are a flight attendant standing at the door, admitting a single line passengers to an aircraft. The passengers are arranged by boarding group but you can only see and admit them one at a time. Periodically, as people enter you will learn when one boarding group has ended and then next has begun.
To advance to the next group, you're going to have to admit all the remaining passengers in the current group. You can't see what is downstream in line without letting all the current passengers through.
Unix comparison: The design of groupby() is algorithmically similar to the Unix uniq utility.
What the docs say: "The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible."
How to use it: If the data is needed later, it should be stored as a list:
groups = []
uniquekeys = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)