21

I came across this behavior that surprised me in Python 2.6 and 3.2:

>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}

However, dict comprehensions in 3.2 show a more polite demeanor:

>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>> 

Why does fromkeys behave like that?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
joaquin
  • 82,968
  • 29
  • 138
  • 152
  • 1
    the difference is the same as in `[[]]*2` and `[[] for _ in range(2)]`. – jfs Nov 17 '11 at 21:51
  • @J.F.Sebastian I am used to the meaning of [[]]*2 and other gotchas alike. But fromkeys got me by surprise. Maybe is just a question of familiarity...I practically never use the fromkeys method... – joaquin Nov 17 '11 at 22:01

3 Answers3

22

Your Python 2.6 example is equivalent to the following, which may help to clarify:

>>> a = []
>>> xs = dict.fromkeys(range(2), a)

Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.

>>> xs[0] is a and xs[1] is a
True

Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:

xs = dict((i, []) for i in range(2))
user2357112
  • 260,549
  • 28
  • 431
  • 505
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • 2
    Each entry in the resulting `xs` dictionary will have a reference to the same `a` object as its value, whether or not `a` happens to be mutable. But of course the problem in the OP only arises when `a` is mutable and you mutate it. – PM 2Ring Jul 05 '16 at 07:55
7

In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.

Look at this:

>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}

In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.

As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • 4
    Tim, I understand why it happens. My question is more like "why is it designed to behave like this?". Sorry not being clear in the question. – joaquin Nov 17 '11 at 21:53
2

To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.

The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.

Here are some options for creating multiple actual copies of a mutable container:

  1. As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:

    d = {k: [] for k in range(2)}
    

    The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.

  2. Use the form of the dict constructor suggested by @Andrew Clark:

    d = dict((k, []) for k in range(2))
    

    This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.

  3. Use a collections.defaultdict instead of a regular dict:

    d = collections.defaultdict(list)
    

    This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:

    for k in range(2):
        d[k].append(42)
    

    Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.

  4. Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:

    d = {}
    for k in range(2):
        d.setdefault(k, []).append(42)
    

    The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • Good point about the new empty list object getting created every time. OTOH, they're pretty cheap to create, and the unused ones will be recycled. – PM 2Ring Aug 03 '18 at 20:51
  • Recycling is overhead too. It offends my sense of neatness to do it like that, not to mention that not all possibilities are as cheap as list. – Mad Physicist Aug 03 '18 at 21:18
  • I would disagree with your assessment that it "behaves like that because there is no other reasonable choice". Why would a dictionary where each element is the same make a "reasonable choice"? – Confounded Dec 23 '20 at 15:20
  • @Confounded. When you are passed a reference, the only reasonable choice is to bind to that reference directly. Making a proper copy is not reasonable in python because the creation process is so customizable that in many cases it's unclear what even constitutes a copy. – Mad Physicist Dec 23 '20 at 15:25
  • @Confounded. That being said, if you want to explicitly encode what it means to make a new copy, that's possible too. This answer details the different ways of doing that. – Mad Physicist Dec 23 '20 at 15:27