18

I'm trying to group together the binary strings of certain numbers based on how many 1's there are in the string.

This doesn't work:

s = "0 1 3 7 8 9 11 15"
numbers = map(int, s.split())
binaries = [bin(x)[2:].rjust(4, '0') for x in numbers]

one_groups = dict.fromkeys(range(5), [])
for x in binaries:
    one_groups[x.count('1')] += [x]

The expected dictionary one_groups needs to be

{0: ['0000'], 
 1: ['0001', '1000'], 
 2: ['0011', '1001'], 
 3: ['0111', '1011'], 
 4: ['1111']}

But I get

{0: ['0000', '0001', '0011', '0111', '1000', '1001', '1011', '1111'], 
 1: ['0000', '0001', '0011', '0111', '1000', '1001', '1011', '1111'], 
 2: ['0000', '0001', '0011', '0111', '1000', '1001', '1011', '1111'], 
 3: ['0000', '0001', '0011', '0111', '1000', '1001', '1011', '1111'], 
 4: ['0000', '0001', '0011', '0111', '1000', '1001', '1011', '1111']}

So far the only thing that has worked is if I use one_groups[x.count('1')] = one_groups.get(x.count('1')) + [x] instead of one_groups[x.count('1')] += [x]

But why is that so? If I recall correctly, isn't dict[key] supposed to return the value of that dictionary, similar to how dict.get(key) works? I've seen this thread Why dict.get(key) instead of dict[key]? but it didn't answer my question for this particular case, since I know for sure the program isn't meant to get the KeyError

I have also tried one_groups[x.count('1')].append(x) but this doesn't work either.

Boann
  • 48,794
  • 16
  • 117
  • 146
SpectraXCD
  • 309
  • 2
  • 6
  • 8
    `get` return `None` if the key does not exist or any provided default value, while the index operator `[]` raise an error if key does not exist. – adnanmuttaleb Nov 04 '19 at 14:57
  • Sidenote, `bin(x)[2:].rjust(4, '0')` can be simplified to `'{:0>4b}'.format(x)`. – wjandrea Nov 05 '19 at 02:54
  • 1
    BTW it helps to make a [mre]. In this case how you make `binaries` isn't relevant to the question, so you could just provide its value. – wjandrea Nov 05 '19 at 03:14
  • 1
    Does this answer your question? [dict.fromkeys all point to same list](https://stackoverflow.com/questions/15516413/dict-fromkeys-all-point-to-same-list) – Georgy Nov 05 '19 at 09:09

3 Answers3

24

The problem is mutability:

one_groups = dict.fromkeys(range(5), []) - this passes the same list as value to all keys. So if you change one value, you change them all.

It's basically the same as saying:

tmp = []
one_groups = dict.fromkeys(range(5), tmp)
del tmp

If you want to use a new list, you need to do it in a loop - either an explicit for loop or in a dict comprehension:

one_groups = {key: [] for key in range(5)}

This thing will "execute" [] (which equals to list()) for every key, thus making the values with different lists.


Why does get work? Because you explicitly take the current list, but + makes a new result list. And it doesn't matter whether it's one_groups[x.count('1')] = one_groups.get(x.count('1')) + [x] or one_groups[x.count('1')] = one_groups[x.count('1')] + [x] - what matters is that there's +.

I know how everybody says a+=b is just a=a+b, but the implementation may be different for optimisation - in case of lists, += is just .extend because we know we want our result in the current variable, so creating new list would be waste of memory.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
h4z3
  • 5,265
  • 1
  • 15
  • 29
  • Ah, yes, understood. I also remember having a similar problem when I wanted to create a 2D list using ``mylist = [[] * 5] * 5`` and how ``mylist = [[] for x in range(5)] * 5`` would've fixed it. Just for quick clarification, from I understood, this happens because of the variables pointing to the memory address of that empty list. Does this also mean the problem wouldn't occur if I used primitives instead? – SpectraXCD Nov 04 '19 at 15:27
  • 1
    Yes, if you used primitives this will solve it, but will break ```one_groups[x.count('1')] += [x]``` because you can't add a list to a primitive type. A better solution is to use defaultdict instead. – Fakher Mokadem Nov 04 '19 at 15:43
  • 4
    specifically, `+` calls `__add__` and returns a new object, while `+=` calls `__iadd__`, and is not required to return a new object – njzk2 Nov 05 '19 at 04:15
8

The problem is using one_groups = dict.fromkeys(range(5), [])

(This passes the same list as value to all keys. So if you change one value, you change them all)


You can use this instead: one_groups = {i:[] for i in range(5)}

(This thing will "execute" [] (which equals to list()) for every key, thus making the values with different lists.)

Hameda169
  • 608
  • 3
  • 9
4

This is the help on dict's fromkeys method.

Help on built-in function fromkeys:

fromkeys(iterable, value=None, /) method of builtins.type instance Create a new dictionary with keys from iterable and values set to value

That says that fromkeys will accept a value, and even it is a callable, it will evaluate it first, and then assign that value to all the dict keys.

Lists are mutable in Python, and so it will assign the same empty list reference and one change will affect them all.

Use defaultdict instead as so:

>>> from collections import defaultdict
>>> one_groups = defaultdict(list)
>>> for x in binaries:
      one_groups[x.count('1')] += [x]
>>> one_groups = dict(one_groups) # to stop default dict behavior

This will accept assignments to non-existing keys and values will default to empty lists (in this case).

Fakher Mokadem
  • 1,059
  • 1
  • 8
  • 22