Dictionary creation with fromkeys and mutable objects. A surprise

Question

I came across this behavior that surprised me in Python 2.6 and 3.2:

>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}

However, dict comprehensions in 3.2 show a more polite demeanor:

>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>

Why does fromkeys behave like that?

the difference is the same as in `[[]]*2` and `[[] for _ in range(2)]`. — jfs, Nov 17 '11 at 21:51
@J.F.Sebastian I am used to the meaning of [[]]*2 and other gotchas alike. But fromkeys got me by surprise. Maybe is just a question of familiarity...I practically never use the fromkeys method... — joaquin, Nov 17 '11 at 22:01

score 22 · Accepted Answer · edited Aug 03 '18 at 17:56

22

Your Python 2.6 example is equivalent to the following, which may help to clarify:

>>> a = []
>>> xs = dict.fromkeys(range(2), a)

Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.

>>> xs[0] is a and xs[1] is a
True

Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:

xs = dict((i, []) for i in range(2))

edited Aug 03 '18 at 17:56

user2357112

260,549
28
431
505

answered Nov 17 '11 at 21:47

Andrew Clark

202,379
35
273
306

2

Each entry in the resulting `xs` dictionary will have a reference to the same `a` object as its value, whether or not `a` happens to be mutable. But of course the problem in the OP only arises when `a` is mutable and you mutate it. – PM 2Ring Jul 05 '16 at 07:55

Tim Pietzcker · Answer 2 · 2011-11-17T22:07:42.923

In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.

Look at this:

>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}

In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.

As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?

Tim, I understand why it happens. My question is more like "why is it designed to behave like this?". Sorry not being clear in the question. — joaquin, Nov 17 '11 at 21:53

Mad Physicist · Answer 3 · 2018-08-03T18:45:05.680

To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.

The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.

Here are some options for creating multiple actual copies of a mutable container:

As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
```
d = {k: [] for k in range(2)}
```
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by @Andrew Clark:
```
d = dict((k, []) for k in range(2))
```
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
```
d = collections.defaultdict(list)
```
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
```
for k in range(2):
    d[k].append(42)
```
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
```
d = {}
for k in range(2):
    d.setdefault(k, []).append(42)
```
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.

Good point about the new empty list object getting created every time. OTOH, they're pretty cheap to create, and the unused ones will be recycled. — PM 2Ring, Aug 03 '18 at 20:51
Recycling is overhead too. It offends my sense of neatness to do it like that, not to mention that not all possibilities are as cheap as list. — Mad Physicist, Aug 03 '18 at 21:18
I would disagree with your assessment that it "behaves like that because there is no other reasonable choice". Why would a dictionary where each element is the same make a "reasonable choice"? — Confounded, Dec 23 '20 at 15:20
@Confounded. When you are passed a reference, the only reasonable choice is to bind to that reference directly. Making a proper copy is not reasonable in python because the creation process is so customizable that in many cases it's unclear what even constitutes a copy. — Mad Physicist, Dec 23 '20 at 15:25
@Confounded. That being said, if you want to explicitly encode what it means to make a new copy, that's possible too. This answer details the different ways of doing that. — Mad Physicist, Dec 23 '20 at 15:27

Dictionary creation with fromkeys and mutable objects. A surprise

3 Answers3

Linked

Related