1

Consider the following Python interpreter shell session:

>>> class D(dict):
...   def __hash__(self):
...     return id(self)
... 
>>> d1 = D({'a': 'b'})
>>> d2 = D({'a1': 'b1'})
>>> t = {d1: 1, d2: 2}
>>> t[d1]
1
>>> t[d2]
2

Why doesn't dict's __hash__ default to id()? What led to the design decision of forbidding to use mutable entities as dictionary keys?

d33tah
  • 10,999
  • 13
  • 68
  • 158
  • Also see [here](http://stackoverflow.com/questions/31340756/python-why-can-i-put-mutable-object-in-a-dict-or-set), [here](http://stackoverflow.com/questions/6310867/why-arent-python-sets-hashable), [here](http://stackoverflow.com/questions/2671376/hashable-immutable), [here](http://stackoverflow.com/questions/14535730/what-do-you-mean-by-hashable-in-python)... – TigerhawkT3 May 12 '16 at 21:42
  • I would consider that, as a container, a dict does not have a very tangible identity outside of its content. 2 dict with the same content have the same hash and are equal. It makes more sense to be able to compare `{'a': 1} == {'a' == 1}` – njzk2 May 12 '16 at 21:44
  • @TigerhawkT3 I think the question is more a design question. Why are the hash functions of mutable containers dependant on the content ? – njzk2 May 12 '16 at 21:45
  • @TigerhawkT3: I believe that I'm asking a different question and none of the linked answers relate to it. Basically I'm wondering why hashing by the id() and allowing for making dict hashable is clearly not welcome. – d33tah May 12 '16 at 21:46
  • The reasoning for the design decision is discussed ad nauseam in the linked questions, and there's more on Google. – TigerhawkT3 May 12 '16 at 21:46
  • 1
    None of the answers to linked questions seems to answer my question directly. If there's a Google answer, I'd be grateful for pointing me to it. – d33tah May 12 '16 at 21:47
  • Please spend some time searching SO (via Google as well as SO's built-in search functionality) before posting a question in order to avoid creating a duplicate. – TigerhawkT3 May 12 '16 at 21:47
  • 3
    I *did* browse and I'm getting the impression you are blindly assuming that any of the linked questions does have an answer. – d33tah May 12 '16 at 21:48
  • See the comment on the answer in the second ["here"](http://stackoverflow.com/questions/6310867/why-arent-python-sets-hashable) from my first comment to see an explanation in the [official documentation](https://docs.python.org/3.3/faq/design.html#why-must-dictionary-keys-be-immutable) about why dictionary keys must be immutable. – TigerhawkT3 May 12 '16 at 21:50
  • 1
    http://effbot.org/pyfaq/why-must-dictionary-keys-be-immutable.htm – Padraic Cunningham May 12 '16 at 21:52
  • 1
    I don't understand why you don't want to read any of the many existing answers or outside resources. Don't you want an answer to your question? – TigerhawkT3 May 12 '16 at 21:55
  • 1
    here is explanation from Python documentation https://docs.python.org/2/faq/design.html#why-must-dictionary-keys-be-immutable – Hayk Davtyan May 12 '16 at 22:00

3 Answers3

7

Why doesn't dict's __hash__ default to id()?

Because that violates the fundamental invariant that equal objects have equal hashes. If dicts used their id for their hash, then you'd have interactions like the following:

>>> x, y = {}, {}
>>> x == y
True
>>> hash(x) == hash(y)
False
>>> x[{}] = 3
>>> x[{}]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: {}

The behavior would be confusing, inconsistent, and not useful.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • I would suppose that invocation `x[{}]` would raise `KeyError`. It would be ok for me if we can index by reference types and a key comparison would be the reference comparison. Could you provide some REALLY confusing example? I used this technique in another programming languages and consider it as a convinient alternative for plenty `switch` statements. – pt12lol May 13 '16 at 07:03
  • 1
    @pt12lol: Every type in Python is a reference type, so making key comparison work by reference for reference types means that after `x[1000] = 4`, `x[1000]` is also a `KeyError`. If you want a dict that works by identity, you can implement that on top of the built-in dict type, and in fact, mutable objects that use `is` for `==` will behave like you want by default, but having dicts use `is` instead of `==` depending on characteristics of the key type would be much more confusing than helpful. – user2357112 May 13 '16 at 07:20
3

dict's __hash__ doesn't default to id() because it would defeat the fundamental purpose of hash tables.

As linked in this comment on another question's answer:

And see the Python FAQ entry Why must dictionary keys be immutable?

The linked documentation says the following (with additional explanation for those who want to read more):

The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could also change. But since whoever changes the key object can’t tell that it was being used as a dictionary key, it can’t move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary it won’t be found because its hash value is different. If you tried to look up the old value it wouldn’t be found either, because the value of the object found in that hash bin would be different.

Emphasis added.

Basically, the hash table is only useful if it only holds immutable values. Workarounds like id() defeat the entire purpose, because (as explained in the bold portion of the above quote) you wouldn't find values that you tried to look up.

Community
  • 1
  • 1
TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97
1

In general it is more useful and intuitive to have equality based on value and not identity. For example, consider the following snippet:

from collections import Counter
words = 'dog cat dog'.split()
print Counter(words)  # Counter({'dog': 2, 'cat': 1})

That makes sense. It is expected. Now, if things worked your way, we'd have this:

dicts = [{}, {}]
print Counter(dicts)  # Counter({{}: 1, {}: 1})

That is NOT the first thing I would expect. You could explain it to me, but the first time I came across it (especially if I was new to programming) it would probably cause me quite some frustration while debugging. And even after understanding it it would probably catch me off guard once in a while. So while the language could be designed in your way, it would not be user friendly.

Similarly, dictionaries could be hashed based on content, and that would make the above snippet behave more intuitively. But now if I mutate the key I'm in serious trouble. I could be warned not to, but rather have the language protect me from making this kind of mistake than giving me all the freedom I want to shoot myself in the foot.

A better way to go would be to use an immutable dict, which you can get here.

Alex Hall
  • 34,833
  • 5
  • 57
  • 89