1

EDIT:

Why does [defaultdict(int)] * 3 return three references to the same object?

Original Title

Unpack list of defaultdicts into variables has unexpected behavior in Python

Unpacking an initialized list of defaultdict types into variables does not appear to work the way I would expect it to. Does anyone know why this behaves this way (see code snippets below)? I'm using using Python 3.9.1.

# Equivalent behavior - works OK
a,b,c = [int(), int(), int()]
d,e,f = [int()] * 3

# Expected equivalent behavior - BEHAVES DIFFERENTLY
l,m,p = [defaultdict(int), defaultdict(int), defaultdict(int)]
q,r,s = [defaultdict(int)] * 3

Full snippet:

>>> a,b,c = [int(), int(), int()]
>>> a+=4; b+=2; c+=7
>>> a,b,c
(4, 2, 7)
>>> d,e,f = [int()] * 3
>>> d+=11; e+=8; f+= 41
>>> d,e,f
(11, 8, 41)

>>> from collections import defaultdict
>>> l,m,p = [defaultdict(int), defaultdict(int), defaultdict(int)]
>>> l['a']+=1; m['b']+=2; m['c']+=3;
>>> l,m,p
(
  defaultdict(<class 'int'>, {'a': 1}), 
  defaultdict(<class 'int'>, {'b': 2, 'c': 3}), 
  defaultdict(<class 'int'>, {})
)
>>> q,r,s = [defaultdict(int)] * 3
>>> q['a']+=111; r['b']+=222; m['c']+=333;
>>> q,r,s
(
  defaultdict(<class 'int'>, {'a': 111, 'b': 222}), 
  defaultdict(<class 'int'>, {'a': 111, 'b': 222}), 
  defaultdict(<class 'int'>, {'a': 111, 'b': 222})
)

This question is based on the topic posed by the question "Unpack list to variables".

  • 1
    What unexpected behaviour are you talking about exactly? That `[defaultdict(int)] * 3` results in a list with three references to the same defaultdict? – Iain Shelvington Oct 13 '21 at 16:50
  • Thanks, that's exactly the question I have. I've updated the title to reflect this. – rcodemonkey Oct 13 '21 at 17:02
  • 3
    This is fundamentally the exact same problem [as described here](https://stackoverflow.com/questions/240178/list-of-lists-changes-reflected-across-sublists-unexpectedly). The fix in this case is something like `q,r,s = [defaultdict(int) for _ in range(3)]` – Cory Kramer Oct 13 '21 at 17:03
  • The comments on your first code snippet are wrong: `[int()] * 3` behaves exactly the same as `[defaultdict(int)] * 3`! In both case, the same reference is copied three times. See for yourself: `[id(x) for x in [d, e, f]]`. – Konrad Rudolph Oct 13 '21 at 17:28
  • @KonradRudolph I just tested `[int()]*3` myself with the same results as OP. Also, see my answer below as to why ints can have the same id but act independently. – whege Oct 13 '21 at 17:30
  • @KonradRudolph oh my bad, I didn't see you there over my shoulder watching me work...if I didn't test it I wouldn't have said so – whege Oct 13 '21 at 17:32
  • @LiamFiddler I’m well aware why ints “behave differently” but regardless OP is wrong about the semantics of their code (and whether they differ); this is a very common misunderstanding of reference types and immutability, but these are completely really orthogonal issues. The last paragraph of your answer does indicate that you’re also aware of this. – Konrad Rudolph Oct 13 '21 at 17:33
  • @KonradRudolph if you're well aware of it, then it shouldn't surprise you that he can run that test on ints have have each element changed independently, and not have the same behavior with defaultdict. It seems to me his question was "these two behave the same for ints, why not for defaultdict" so it's entirely a question of mutability and the way Python handles ints in memory – whege Oct 13 '21 at 17:35
  • 1
    @LiamFiddler Yes, and I’m not questioning that. All I’m saying that the behaviour *of the declaration* is the same for ints and defaultdicts, contrary to OP’s comments in the first code snippet. Because … it is. – Konrad Rudolph Oct 13 '21 at 17:36
  • @KonradRudolph ah I'm following now; my apologies – whege Oct 13 '21 at 17:38

1 Answers1

2

The issue is with locations in memory. A simple console test shows this:

> from collections import defaultdict
> l,m,p = [defaultdict(int), defaultdict(int), defaultdict(int)]
> id(l) == id(p)
False
> id(m) == id(p)
False

Now let's try the other way:

> l,m,p = [defaultdict(int)] * 3
> id(l) == id(p)
True
> id(m) == id(p)
True

In the first case, you are creating three separate slots in memory. In the second, you are creating one spot in memory and then creating two additional pointers to that slot in memory; thus when you update one, they all change since they are all pointing to the same slot in memory.

This answer goes into some more detail on why this happens with certain datatypes, but not others. TL;DR - small ints can be in the same object but with different pointers for the sake of optimization. That's why you can run the id() or is checks on the integer variables and see that they point to the same object, but have them behave independently when modifying each one.

whege
  • 1,391
  • 1
  • 5
  • 13