0

I am trying to solve a simple problem using itertools.groupby: group the numbers from 0 to 7 according to the number of 1's in their binary representation. So I want to produce the mapping

{0: [0], 1: [1, 2, 4], 2: [3, 5, 6], 3: [7]}

But here is what I get from groupby:

>>> from itertools import groupby
>>> def key(i):
...     print(i, bin(i), bin(i).count('1'))
...     return bin(i).count('1')
>>> groups = {k: list(v) for k, v in groupby(range(8), key=key)}
0 0b0 0
1 0b1 1
2 0b10 1
3 0b11 2
4 0b100 1
5 0b101 2
6 0b110 2
7 0b111 3
>>> groups
{0: [0], 1: [4], 2: [5, 6], 3: [7]}

The result has me absolutely baffled. The print statements show that the individual calls to the key function behave as expected, and yet I loose the numbers 1, 2, 3 along the way. It get's even worse when I use e.g. 16:

>>> {k: list(v) for k, v in groupby(range(16), key=lambda i: bin(i).count('1'))}
{0: [0], 1: [8], 2: [12], 3: [13, 14], 4: [15]}

I am hoping to understand how groupby arrives at this result, and to learn if their is a way to solve this using itertools. (I am not looking for a solution to the problem as such, only for a fancy generator solution using e.g. itertools.)

(I've tried this in python 3.9 and 3.10 so I'm fairly certain it is not a bug)

tBuLi
  • 2,295
  • 2
  • 16
  • 16

1 Answers1

1

If you want to use groupby you need to sort input list first.

groups = {k: list(v) for k, v in groupby(sorted(range(8), key=key), key=key)}

Your generator discards old entries when same group is encountered later.

You are already using dict so you don't need to use groupby at all

d = defaultdict(list)
for i in range(8):
    d[key(i)].append(i)
ekim boran
  • 1,809
  • 12
  • 22
  • Thanks for the explanation! The need for an extra sort does perhaps make the groupby solution less readable than the direct implementation using defaultdict, which is also the solution I'm currently using. However, I was looking to feed this building block into itertools' combinations and products later on and for larger numbers, so that's why I thought a generator might result in more performant code. However, sorted makes a new sequence in memory anyway so I guess there is nor performance benefit to using groupby then. – tBuLi Sep 25 '22 at 09:45
  • 1
    @tBuLi Happy to help! If you think about it there is no way to group incrementally even if group by was implemented without need for sorting it would use some kind of dictionary under the hood. I am guessing that is why it is implemented for sorted sequences only. – ekim boran Sep 25 '22 at 09:50
  • Yeah you are right. Once I saw your answer I realized that it makes sense, how could this work incrementally? I guess I was blinded by my desire to make a generator out of everything, but sometimes you really need to put values in memory ahead of time. Thanks again! – tBuLi Sep 25 '22 at 10:41