41

Here below when I try to hash a list, it gives me an error but works with a tuple. Guess it has something to do with immutability. Can someone explain this in detail ?

List

 x = [1,2,3]
 y = {x: 9}
  Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
 TypeError: unhashable type: 'list'

Tuple

z = (5,6)
y = {z: 89}
print(y)
{(5, 6): 89}
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
Nagendra Kakarla
  • 1,011
  • 3
  • 10
  • 17
  • 1
    Yes it's about a list being mutable. If you store the hash value somewhere (as part of a `dict` for example) and then you modify the list then it's not a valid dict key anymore. That doesn't mean it makes no sense that mutable objects have hash values but that there is only no unambiguous interpretation of such a hash value and so it is up to the programmer to implement this consistently throughout the application. – a_guest Feb 13 '17 at 12:11
  • 7
    A tuple is not always hashable. (1, 2, [3]) is not hashable, because its third element is unhashable. – P. Camilleri Feb 13 '17 at 12:25
  • @P.Camilleri that implies that `hash` on a tuple is recursive. That's interesting, I didn't know that before. – Mark Ransom Dec 05 '19 at 16:15
  • @MarkRansom AFAIK, the hash for a tuple is (essentially) just calculated by first hashing each element, then doing to those results. This lets your tuple be hashable as long as the contents are each individually hashable. – CoffeeTableEspresso Apr 09 '21 at 17:27

6 Answers6

58

Dicts and other objects use hashes to store and retrieve items really quickly. The mechanics of this all happens "under the covers" - you as the programmer don't need to do anything and Python handles it all internally. The basic idea is that when you create a dictionary with {key: value}, Python needs to be able to hash whatever you used for key so it can store and look up the value quickly.

Immutable objects, or objects that can't be altered, are hashable. They have a single unique value that never changes, so python can "hash" that value and use it to look up dictionary values efficiently. Objects that fall into this category include strings, tuples, integers and so on. You may think, "But I can change a string! I just go mystr = mystr + 'foo'," but in fact what this does is create a new string instance and assigns it to mystr. It doesn't modify the existing instance. Immutable objects never change, so you can always be sure that when you generate a hash for an immutable object, looking up the object by its hash will always return the same object you started with, and not a modified version.

You can try this for yourself: hash("mystring"), hash(('foo', 'bar')), hash(1)

Mutable objects, or objects that can be modified, aren't hashable. A list can be modified in-place: mylist.append('bar') or mylist.pop(0). You can't safely hash a mutable object because you can't guarantee that the object hasn't changed since you last saw it. You'll find that list, set, and other mutable types don't have a __hash__() method. Because of this, you can't use mutable objects as dictionary keys:

>>> hash([1,2,3])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Eric Duminil's answer provides a great example of the unexpected behaviour that arises from using mutable objects as dictionary keys

daveruinseverything
  • 4,775
  • 28
  • 40
  • 5
    There's nothing wrong about making mutable objects hashable, you just have to be consistent and precise about what that means within your application. You can implement the `__hash__` method for any custom class, making it hashable. You can even inherit from `list` and then make it hashable by implementing `__hash__`. It's just not unambiguous what the hash value of a mutable object is and this is why Python leaves it to the programmer to decide and define that. – a_guest Feb 13 '17 at 12:43
  • 2
    @a_guest You leave yourself open for unexpected behaviour - if your hash method doesn't depend on the contents of your object, it could be possible for two objects to share the same hash. On the flip side, if your object is mutable, you may run into situations where you set `{obj: value}` but later on cannot retrieve `mydict.get(obj)`, because the object - and therefore the hash - has changed. It's *possible* in python, and I'm sure you can find a scenario where it might even make sense … but as a general rule of thumb there *are* things wrong with making mutable objects hashable. – daveruinseverything Feb 13 '17 at 12:46
  • 4
    No you didn't get the point. First of all, "clashes" of hash values can always occur (also for immutable) objects and this is why hashtables employ linked lists in order to store objects that clash. Secondly (the "flip side"), if the hash of a mutable object changes upon modification is up to the programmer. Both situations make sense (changing the hash or not) however it needs to be consistent throughout the _application_. Because Python is a _programming language_ they decided to leave it to the programmer. – a_guest Feb 13 '17 at 12:56
  • 1
    @a_guest the question is about native datatypes, so it is not "up to the programmer". You're correct that it's perfectly possible to construct an object that is both mutable and safely washable, but that point has nothing whatsoever to do with the question - and as a general practice sounds like an incredibly esoteric use case. – daveruinseverything Mar 03 '21 at 04:08
24

Here are examples why it might not be a good idea to allow mutable types as keys. This behaviour might be useful in some cases (e.g. using the state of the object as a key rather than the object itself) but it also might lead to suprising results or bugs.

Python

It's possible to use a numeric list as a key by defining __hash__ on a subclass of list :

class MyList(list):
    def __hash__(self):
        return sum(self)

my_list = MyList([1, 2, 3])

my_dict = {my_list: 'a'}

print(my_dict.get(my_list))
# a

my_list[2] = 4  # __hash__() becomes 7
print(next(iter(my_dict)))
# [1, 2, 4]
print(my_dict.get(my_list))
# None
print(my_dict.get(MyList([1,2,3])))
# None

my_list[0] = 0  # __hash_() is 6 again, but for different elements
print(next(iter(my_dict)))
# [0, 2, 4]
print(my_dict.get(my_list))
# 'a'

Ruby

In Ruby, it's allowed to use a list as a key. A Ruby list is called an Array and a dict is a Hash, but the syntax is very similar to Python's :

my_list = [1]
my_hash = { my_list => 'a'}
puts my_hash[my_list]
#=> 'a'

But if this list is modified, the dict doesn't find the corresponding value any more, even if the key is still in the dict :

my_list << 2

puts my_list
#=> [1,2]

puts my_hash.keys.first
#=> [1,2]

puts my_hash[my_list]
#=> nil

It's possible to force the dict to calculate the key hashes again :

my_hash.rehash
puts my_hash[my_list]
#=> 'a'
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
  • 6
    Upvoted for the practical example of *why* it's not a good idea to use mutable objects for keys. – daveruinseverything Feb 13 '17 at 12:40
  • 7
    It's a misconception that using mutable objects as dict keys is a bad idea. It really depends on the definitions within your application. The only thing you _need_ to guarantee is that two objects that compare equal have the same hash value. Or the other way round: two objects with different hash values do not compare equal. Also note that a _list_ is not the only mutable type that exists. Hashable mutable objects make more sense when it comes to custom types. So this example fails somehow in the sense that it seems to set "list" equal with "mutable type". – a_guest Feb 13 '17 at 13:12
  • 2
    @a_guest Thanks for the comment. I wrote that it **might** be a bad idea. – Eric Duminil Feb 13 '17 at 13:22
  • During one minute, I didn't understand why my_dict.get(MyList([1,2,3])) is None ... – user565739 Jan 19 '19 at 06:50
  • @a_guest it's actually quite common to have objects where one part of the object acts as a key, and as long as the key is immutable it makes perfect sense to use it for the hash and exclude the rest of the object. I'm rather disappointed with C++ because it used to allow this but has changed recently. – Mark Ransom Dec 05 '19 at 16:23
  • 3
    It might be worth adding that it *may* be a good idea, as long as it is clearly understood that, when allowing mutable types as keys, we are usually using *the state of the object* as a key rather than *the object itself*. This would further emphasize @a_guest's valid point. – rd11 Aug 26 '20 at 13:10
  • 1
    @rd11 Thanks for the comment. Updated. – Eric Duminil Aug 26 '20 at 14:19
  • I just wonder, why not use `id(self)//16` as the hash value? I guess this is the default for all self-defined classes. Will that also cause a problem? – River Jan 28 '21 at 22:05
8

A hashset calculates the hash of an object and based on that hash, stores the object in the structure for fast lookup. As a result, by contract once an object is added to the dictionary, the hash is not allowed to change. Most good hash functions will depend on the number of elements and the elements itself.

A tuple is immutable, so after construction, the values cannot change and therefore the hash cannot change either (or at least a good implementation should not let the hash change).

A list on the other hand is mutable: one can later add/remove/alter elements. As a result the hash can change violating the contract.

So all objects that cannot guarantee a hash function that remains stable after the object is added, violate the contract and thus are no good candidates. Because for a lookup, the dictionary will first calculate the hash of the key, and determine the correct bucket. If the key is meanwhile changed, this could result in false negatives: the object is in the dictionary, but it can no longer be retrieved because the hash is different so a different bucket will be searched than the one where the object was originally added to.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • "[...] once an object is added to the dictionary, the hash is not allowed to change." This is a pure matter of definition and Python just decided against making lists hashable as it's not unambiguous what that means. From the Zen of Python: "In the face of ambiguity, refuse the temptation to guess." However such concepts exist as you can see from Ruby for example. – a_guest Feb 13 '17 at 12:38
  • 1
    @a_guest: no it is also a matter of efficiency. The only way you can make this work with mutable objects is adding a trigger that rehashes the object if it is modified. Most programming languages have decided against that. But evidently as they say: *every problem in computer science can be solved by another layer of indirection*. – Willem Van Onsem Feb 13 '17 at 12:40
  • @a_guest: as you can see [here](http://jafrog.com/2012/10/07/mutable-objects-as-hash-keys-in-ruby.html) you can indeed allow it, but it creates all kinds of side effects that are rather error-prone and tend to introduce bugs. – Willem Van Onsem Feb 13 '17 at 12:43
  • Only because an object is mutable doesn't mean its hash value changes upon modification. This is completely up to the programmer (meaning that you don't necessarily need to rehash). Even the opposite situation makes sense (hash changes upon modification). You just need to guarantee that two objects that compare equal have the same hash value. Or opposite: two objects with different hash values do not compare equal. – a_guest Feb 13 '17 at 13:06
  • @a_guest: yeah of course, if they had implemented the tuple to look at the `id(..)` of the elements, then they could have contained mutable elements. The only thing you have to guarantee is that the *hash* does not change after adding. Your last two statements are of course fundamental to hashing, but the second is actually a tautology of the latter: if `x => y` then `not y => not x` (that was already discovered by the ancient Greeks if I recall correctly). – Willem Van Onsem Feb 13 '17 at 13:08
  • You don't even need to guarantee that "the hash does not change after adding". Especially as an object is not aware of whether it is added to some collection or not. A definition that makes more sense is that the hash does not change for the lifetime of an object, but not even this is required. You can find scenarios where hashes changing upon modification make sense. A little aside: A _tautology_ is a single statement (not a relation between many), the other is a _(logical) deduction_. Also note that "from `x => y` follows `not x => not y`" is actually false (try with my hash example). – a_guest Feb 13 '17 at 13:24
  • @a_guest: an object is indeed not aware that it is added to a dictionary. That is why it is called a *contract*: since Python is a *Turing complete* language every non-trivial language-invariant property cannot be proven in general and thus by extent, no contract can be proven *in general*. Therefore it is up to the programmer to guarantee that the contracts are respected. For the logical tautology, apparently you did not noticed that you swapped statements in your last claim. In my first bachelor year the tautology `(x -> y) <-> (~y -> ~x)` was probably one of the first we proved to be true. – Willem Van Onsem Feb 13 '17 at 17:02
  • @a_guest: and in mathematics a statement of the form *if x then y* is actually translated into `x -> y`. One can here even say *iff* (that's `<->`). – Willem Van Onsem Feb 13 '17 at 17:08
  • @a_guest: apparantly this thing even has a name: [*contraposition*](https://en.wikipedia.org/wiki/Contraposition). – Willem Van Onsem Feb 13 '17 at 17:21
  • "Therefore it is up to the programmer to guarantee that the contracts are respected." - Exactly! So as long as a program conforms to the relevant contracts there is no problem with making mutable objects hashable or using them as dict keys for example. "Therefore it is up to the programmer to guarantee that the contracts are respected." - And this is exactly the reason why! It's up to the programmer to be aware of what it means to hash a mutable object. – a_guest Feb 14 '17 at 08:16
  • @a_guest: well yeah but that is basically what programming is all about: ensuring that state-invariants are respected, that loop-variants are respected, that pre-conditions are respected, that `__eq__` returns the opposite of `__ne__`. Furthermore Rice's theorem was proven in the late 50s if I recall. So that is nothing new. – Willem Van Onsem Feb 14 '17 at 09:05
4

I would like to add the following aspect as it's not covered by other answers already.

There's nothing wrong about making mutable objects hashable, it's just not unambiguous and this is why it needs to be defined and implemented consistently by the programmer himself (not by the programming language).

Note that you can implement the __hash__ method for any custom class which allows its instances to be stored in contexts where hashable types are required (such as dict keys or sets).

Hash values are usually used to decide if two objects represent the same thing. So consider the following example. You have a list with two items: l = [1, 2]. Now you add an item to the list: l.append(3). And now you must answer the following question: Is it still the same thing? Both - yes and no - are valid answers. "Yes", it is still the same list and "no", it has not the same content anymore.

So the answer to this question depends on you as the programmer and so it is up to you to manually implement hash methods for your mutable types.

a_guest
  • 34,165
  • 12
  • 64
  • 118
  • In Python, when are dict hashes calculated? Defining `__hash__` might not help much if it's only called when a key-value pair is added to the dict. – Eric Duminil Feb 13 '17 at 13:22
  • 1
    I don't get your point here. Staying with the list example, what's wrong with the following (apart from efficiency): `class MyList(list): def __hash__(self): return 0`. Has the exact same functionality as a list and fulfills all conditions to hash values. You can use it as a dict key without any problems (apart from dict access won't be `O(1)` anymore but this is efficiency). – a_guest Feb 13 '17 at 13:32
  • Efficient look-up is the whole point of a dictionary. A dict whose keys all have the same hash is basically a list. My question was about a non-constant hash, for example `sum(self)`. – Eric Duminil Feb 13 '17 at 13:36
3

Based on Python Glossary

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

All of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not.

MaoD
  • 121
  • 1
  • 5
2

Because a list is mutable, while a tuple is not. When you store the hash of a value in, for example, a dict, if the object changes, the stored hash value won't find out, so it will remain the same. The next time you look up the object, the dictionary will try to look it up by the old hash value, which is not relevant anymore.

To prevent that, python does not allow you to has mutable items.

blue_note
  • 27,712
  • 9
  • 72
  • 90