Choice made by Python 3.5 to choose the keys when comparing them in a dictionary

Question

When constructing a dictionary as follows:

dict = { True: 'yes', 1: 'No'}

When I run it in the interactive Python interpreter the dict is represented this way:

dict = {True: 'No'}

I understand that the values True and 1 are equal due to the type coercion because when comparing numeric types, the narrowed type is widened to the other type (boolean is a child of integer). So as I understood from the documentation, when we enter True == 1 Python converts True to 1 and compares them.

What I don't understand is why the True is selected as a key instead of 1.

I am missing something?

This is very curious. I just tried following: a = {True: 5}; a[1]; This works — Matthias Schreiber, May 11 '16 at 13:34
Nice question! I wonder if standard defines how dict literals with duplicate keys are handled or if it's just CPython implementation detail... I got a feeling it'll receive some nice answer with deep down analysis of CPython interpreter code. — Łukasz Rogalski, May 11 '16 at 13:34
It's just the *last* key-value pair that wins. Since `1 == True`, the second key overwrites the value for the first key. There is no reason to re-write the key for that operation however, only the value needs updating. — Martijn Pieters, May 11 '16 at 13:40
everyone stop answering [this question!](http://stackoverflow.com/questions/37018085/python-dictionary-doesnt-have-all-the-keys-assigned-or-items?lq=1) The OP is asking why `True` is kept as the key even though it keeps the value assigned to `1`, **this question is about "why is the mechanics of choosing the value different then choosing the key"** — Tadhg McDonald-Jensen, May 11 '16 at 13:40
Why would it rewrite the key? Keys are immutable, `True` and `1` hash and compare equal, so they are equal, `True` was in there first, so `True` it is. If you swap the order of the dict-literal, putting key `1` first, it'll be `1`. When a new item is supplied, if it is already represented in the dict (and it is, since `True` and `1` are the same), it overwrites the value - standard dict behavior. But it shouldn't overwrite the **key**, that makes no sense. — dwanderson, May 11 '16 at 13:44
why True ..instead of 1? There could be something in the language spec about order of evaluation of parameters, but the rule of thumb is: you are doing something stupid, resulting in undefined behavior. This behavior can change at any time without a notice. Whatever the reason, relying on it will break your program in the future. — Muposat, May 11 '16 at 13:51
@Muposat: there is nothing 'stupid' about what is being done here, lets keep this constructive and not imply something about the programmer. — Martijn Pieters, May 11 '16 at 13:54
Short answer: there's no benefit in replacing the old key with the new synonymous key, so Python avoids performing that unnecessary work. Of course, if you _really_ want to replace it you can do so explicitly: `del dct[True]; dct[1]='one'` — PM 2Ring, May 11 '16 at 14:29
Python dictionary are unordered `key: value` but when constructed, they are processed in order @dwanderson. That's what you wanted to say? @Muposat this is a question that came next to an other question asked about why `True` and `1` are mapped to `True` and I answered that question, so some programmers was confronted to that. And I was wondering why python 3 made this choice of the `True` value. The only question that is stupid is the one we don't ask. Thanks for everyone's helping me to understand — Ikra_5, May 11 '16 at 15:53
There is no **type coercion** happening here. `True` is an instance of (a subclass of an) `int` to begin with, and there is definitely no "widening" or such. — Antti Haapala -- Слава Україні, May 12 '16 at 06:10
@Antti Haapala to be an instance of a class and to be a subclass is very different (instatiation and enharitance) and i do think that there is a type coercion to compare the two and it's not the integer who is coerced. so if you know the mecanisme of comparing them, i'll be happy to know — Ikra_5, May 12 '16 at 09:44
The mechanisms are the `__eq__` (and `__hash__`) slots of the `int` and `bool` types, and as it turns out `bool.__eq__ is int.__eq__` and `bool.__hash__ is int.__hash__`, i.e., these methods are inherited as such, and not overridden by `bool`, and `int.__eq__`. The actual comparison happens in the `long_richcompare` of `Objects/longobject.c` in CPython 3. — Antti Haapala -- Слава Україні, May 12 '16 at 11:02
Also try: `int.__eq__(True, True)`, `int.__eq__(1, True)` and `int.__eq__(1.0, True)`. — Antti Haapala -- Слава Україні, May 12 '16 at 11:04
Thanks @Anti Haapla, may be you should add a more detailed answer to the stream — Ikra_5, May 12 '16 at 16:47

Alex Riley · Accepted Answer · 2017-07-16T12:29:51.103

Dictionaries are implemented as hash tables and there are two important concepts when adding keys/values here: hashing and equality.

To insert a particular key/value, Python first computes the hash value of the key. This hash value is used to determine the row of the table where Python should first attempt to put the key/value.

If the row of the hash table is empty, great: the new key/value can inserted into the dictionary, filling the empty row.

However, if there's already something in that row, Python needs to test the keys for equality. If the keys are equal (using ==) then they're deemed to be the same key and Python just needs to update the corresponding value on that row.

(If the keys are not equal Python looks at other rows in the table until it finds the key or reaches an empty row, but that's not relevant for this question.)

When you write {True: 'yes', 1: 'No'}, you are telling Python to create a new dictionary and then fill it with two key/value pairs. These are processed left to right: True: 'yes' then 1: 'No'.

We have hash(True) equals 1. The key True goes in at row 1 in the hash table and the string 'yes' is its value.

For the next pair, Python sees that hash(1) is also 1 and so looks at row 1 of the table. There's something already there, so now Python checks the keys for equality. We have 1 == True so 1 is deemed to be the same key as True and so its corresponding value is changed to the string 'No'.

This results in a dictionary with one entry: {True: 'No'}.

If you want to peer at the guts of CPython 3.5 to see what creating a dictionary looks below the surface-Python level, here's more detail.

The Python code {True: 'yes', 1: 'No'} is parsed into tokens and given to the compiler. Given the syntax, Python knows that a dictionary must be created using the values inside the braces. Byte code to load the four values onto the virtual machine's stack (LOAD_CONST) and then build the dictionary (BUILD_MAP) is queued up.
The four constant values are pushed onto the top of the stack in the order that they're seen:
```
'No'
1
'yes'
True
```

The opcode BUILD_MAP is then called with the argument 2 (Python counted two key/value pairs). This opcode is responsible for actually creating dictionary from the items on the stack. It looks like this:

TARGET(BUILD_MAP) {
    int i;
    PyObject *map = _PyDict_NewPresized((Py_ssize_t)oparg);
    if (map == NULL)
        goto error;
    for (i = oparg; i > 0; i--) {
        int err;
        PyObject *key = PEEK(2*i);
        PyObject *value = PEEK(2*i - 1);
        err = PyDict_SetItem(map, key, value);
        if (err != 0) {
            Py_DECREF(map);
            goto error;
        }
    }

    while (oparg--) {
        Py_DECREF(POP());
        Py_DECREF(POP());
    }
    PUSH(map);
    DISPATCH();
}

The three key steps here are as follows:

An empty hashtable is created using _PyDict_NewPresized. Small dictionaries (of just a few items, like 2 in this case) need a table with eight rows.
The for loop is entered, starting at 2 (in this case) and counting down to 0. PEEK(n) is a macro that points to the nth item down the stack. Therefore on the first iteration of the loop, we'll have

PyObject *key = PEEK(2*2);       /* item 4 down the stack */  
PyObject *value = PEEK(2*2 - 1); /* item 3 down the stack */

This means that *key will be True and *value will be 'yes' on the first loop through. On the second it will be 1 and 'No'.

PyDict_SetItem is called in each loop to put the current *key and *value into the dictionary. This is the same function that is called when you write dictionary[key] = value. It computes the hash of the key to work out where to look first in the hash table and then, if needed, compare the key to any existing key on that row (as discussed above).

@IljaEverilä: no, dicts do **not** consider types. Equality is all that is tested. Same happens with `str` vs. `unicode` objects in Python 2, since `u'foo' == 'foo'`, the result of `{u'foo': 42, 'foo': 81}` is `{u'foo': 81}`. — Martijn Pieters, May 11 '16 at 13:42
@MartijnPieters just read the docs with thought, right you are — Ilja Everilä, May 11 '16 at 13:43
@TadhgMcDonald-Jensen: objects that test equal **must** produce the same hash. The hash is then used to find the right slot in the hashtable, yes, but if the objects were equal to start with that's a given. — Martijn Pieters, May 11 '16 at 13:44
@MartijnPieters `float("nan") == float("nan") -> False` and yet `hash(float("nan")) == hash(float("nan")) -> True` — Tadhg McDonald-Jensen, May 11 '16 at 13:47
@TadhgMcDonald-Jensen: yes, that's a very rare exception, and technically speaking in violation of the requirement. See [NaNs as key in dictionaries](http://stackoverflow.com/q/6441857) for what that can lead to. — Martijn Pieters, May 11 '16 at 13:48
I never said it was a good idea, using [any floats in hash tables leads to problems](http://stackoverflow.com/questions/23721230/float-values-as-dictionary-key) but I just find it important to consider the entire mechanic since "objects that test equal must produce the same hash." will not be obvious to anyone unfarmiliar with the use of hash. — Tadhg McDonald-Jensen, May 11 '16 at 13:53

score 31 · Answer 2 · edited May 11 '16 at 17:05

Basic premise is - True and 1 have same hashes and are equal to each other - that's why they cannot be separate keys in hash table (technically inequal object with same hashes may - but hash collisions decreases performance).

>>> True == 1
True
>>> hash(1)
1
>>> hash(True)
1

Now, let's consider a bytecode:

import dis
dis.dis("Dic = { True: 'yes', 1: 'No'}")

This prints:

      0 LOAD_CONST               0 (True)
      3 LOAD_CONST               1 ('yes')
      6 LOAD_CONST               2 (1)
      9 LOAD_CONST               3 ('No')
     12 BUILD_MAP                2
     15 STORE_NAME               0 (Dic)
     18 LOAD_CONST               4 (None)
     21 RETURN_VALUE

Basically what happens is that dict literal is tokenized to keys and values, and they are pushed to stack. After that BUILD_MAP 2 coverts two pairs of (keys, values) to dictionary.

Most likely order of data on stack (which seems to be determined by order of keys in dict literal) and implementation details of BUILD_MAP decides on resulting dict keys and values.

It seems like key-value assignments are done in order defined in dict literal. Assignment behaves same as d[key] = value operation, so it's basically:

if key is not in dict (by equality): add key do dict
store value under key

Given {True: 'yes', 1: 'No'}:

Start with {}
Add (True, 'yes')
1. True is not in dict -> (True, hash(True)) == (True, 1) is new key in dict
2. Update value for key equal to 1 to 'yes'
Add (1, 'no')
1. 1 is in dict (1 == True) -> there is no need for new key in dictionary
2. Update value for key equal to 1 (True) with value 'no'

Result: {True: 'No'}

As I commented, I do not know if this is guarranted by Python specs or if it is just CPython implementation-defined behavior, it may differ in other interpreter implementations.

score 21 · Answer 3 · answered May 11 '16 at 13:37

True and 1 are different objects, but they both have the same value:

>>> True is 1
False
>>> True == 1
True

This is similar to two strings that may have the same value, but are stored in different memory locations:

>>> x = str(12345)
>>> y = str(12345)
>>> x == y 
True
>>> x is y
False

First one item is added to the dictionary; then when the other is added, that value already exists as a key. So the key is updated, key values are unique.

>>> {x: 1, y: 2}
{"12345": 2}

score 9 · Answer 4 · answered May 11 '16 at 13:38

9

If the key is already present in the dictionary, it does not override the key only the value associated.

I believe that writing x = {True:"a", 1:"b"} is along the lines of:

x = {}
x[True] = "a"
x[1] = "b"

and by the time it reaches x[1] = "b" the key True is already in the dict so why change it? why not just override the associated value.

answered May 11 '16 at 13:38

Tadhg McDonald-Jensen

20,699
5
35
59

I was just wondering why when the two keys are evaluated for equity Python choose true instead of 1, because type coercion has it rules and True is evaluated to 1 to be compared to the key 1 and the answer seems to be order of the keys. And what is interesting about that is to know if it is an implementation of the CPython or it is a Python spec – Ikra_5 May 11 '16 at 16:22
The absolutely only case where it makes a difference is when objects are considered equal (and produce same hash value) but act differently. I cannot think of a case other then `True vs 1` or `False vs 0` where this comes up, and as you have seen it makes absolutely no difference except when printing the dict out (`d[1] <==> d[True]`). – Tadhg McDonald-Jensen May 11 '16 at 16:43
imagine if every time you did `d['a'] = 4` it changed which `'a'` object was used for a key, what would be the point? This isn't a python spec so I guess that defaults it to an implementation detail but I don't see why you would want to implement it any other way... – Tadhg McDonald-Jensen May 11 '16 at 16:45
I don't want to implement it in an other way, I wanted to understand and there is nothing wrong with this behavior. Understanding that dict are constructed the way @Rogalski explained was very interesting. – Ikra_5 May 11 '16 at 16:58
I agree... But why did you comment on my question in the first place? I assumed it was to ask if I knew if it was a standard or implementation detail and I think I overreacted a bit, but did you have another reason to comment on my answer specifically? – Tadhg McDonald-Jensen May 11 '16 at 19:13
By the way if it's a spec it should be common to other implementations of Python – Ikra_5 May 12 '16 at 16:50
What ever it is, I don't recommend relying on this behaviour in actual code. – Tadhg McDonald-Jensen May 12 '16 at 16:55
I agree, it's very confusing at first glance and error prone – Ikra_5 May 12 '16 at 17:03

score 6 · Answer 5 · answered May 11 '16 at 13:32

6

The ordering seems to be the reason. Code example:

>>> d = {True: 'true', 1: 'one'}
>>> d
{True: 'one'}
>>> d = {1: 'one', True: 'true'}
>>> d
{1: 'true'}

answered May 11 '16 at 13:32

leovp

4,528
1
20
24

3

@rll: of course that's the point. The first key sets a value, then the next key is *equal* to the first, and sets a new value for that key. – Martijn Pieters May 11 '16 at 13:41
@MartijnPieters of course the order is not the point of the question.... as the many other answers show – rll May 11 '16 at 13:43
6

@rll: swapping the order illustrates what happens. For equal keys, the last key-value wins, *but only the value is used*. That the keys don't look equal to someone not familiar with bools being a subclass of int is also important, but *ordering* is very much part of the explanation. – Martijn Pieters May 11 '16 at 13:46

score 3 · Answer 6 · answered May 11 '16 at 13:55

3

It's an ambiguous statement.

Logic: d = { True: 'no', 1: 'yes'}

When python evaluates the expression, it does it sequentially, so it's equivalent to this.

d = dict() d[True] = 'no' d[1] = 'yes'

The constant True is the key, but it evaluates to 1, so you're just setting the value for the key twice.

answered May 11 '16 at 13:55

Jason Lee Eaton

168
7

did you copy this from [my answer](http://stackoverflow.com/a/37164428/5827215) or did we come up with the same reasoning around the same time? – Tadhg McDonald-Jensen May 11 '16 at 17:08
I was typing it out on my phone in the bathtub while you were posting your answer (I type slow on the phone). Noticed the duplicate when I posted, but kept it because we explained it in slightly different words so maybe it would help comprehension. – Jason Lee Eaton May 11 '16 at 17:09
2

That's... more specific then you needed to be. This still is mostly answering [this other question](http://stackoverflow.com/questions/37018085/python-dictionary-doesnt-have-all-the-keys-assigned-or-items?lq=1) about why True and 1 correspond to the same place in the dict. (In any case I'm glad you didn't plagiarize) – Tadhg McDonald-Jensen May 11 '16 at 19:09

Choice made by Python 3.5 to choose the keys when comparing them in a dictionary

6 Answers6