7

A common issue on SO is removing duplicates from a list of lists. Since lists are unhashable, set([[1, 2], [3, 4], [1, 2]]) throws TypeError: unhashable type: 'list'. Answers to this kind of question usually involve using tuples, which are immutable and therefore hashable.

This answer to What makes lists unhashable? include the following:

If the hash value changes after it gets stored at a particular slot in the dictionary, it will lead to an inconsistent dictionary. For example, initially the list would have gotten stored at location A, which was determined based on the hash value. If the hash value changes, and if we look for the list we might not find it at location A, or as per the new hash value, we might find some other object.

but I don't quite understand because other types that can be used for dictionary keys can be changed without issue:

>>> d = {}
>>> a = 1234
>>> d[a] = 'foo'
>>> a += 1
>>> d[a] = 'bar'
>>> d
{1234: 'foo', 1235: 'bar'}

It is obvious that if the value of a changes, it will hash to a different location in the dictionary. Why is the same assumption dangerous for a list? Why is the following an unsafe method for hashing a list, since it is what we all use when we need to anyway?

>>> class my_list(list):
...   def __hash__(self):
...     return tuple(self).__hash__()
...
>>> a = my_list([1, 2])
>>> b = my_list([3, 4])
>>> c = my_list([1, 2])
>>> foo = [a, b, c]
>>> foo
[[1, 2], [3, 4], [1, 2]]
>>> set(foo)
set([[1, 2], [3, 4]])

It seems that this solves the set() problem, why is this an issue? Lists may be mutable, but they are ordered which seems like it would be all that's needed for hashing.

Community
  • 1
  • 1
Will
  • 4,299
  • 5
  • 32
  • 50
  • 3
    You are not modifying the stored **key**, you are assigning a **new object** to `a`. – Martijn Pieters Aug 23 '16 at 15:11
  • 3
    These aren't the same at all. In the first case you simply use a *different* object: 1235 instead of 1234. It's not the same object mutated, which would be the case for the list. – Daniel Roseman Aug 23 '16 at 15:11
  • Possible duplicate of [What makes lists unhashable?](http://stackoverflow.com/questions/23268899/what-makes-lists-unhashable) – Robert Columbia Aug 23 '16 at 15:13
  • 1
    Your number example would be correct if `a += 1; print(d[a])` would result in `'foo'`. It does not. However, with a mutated list it would (if mutating lists could be used as keys, which they cannot, because exactly this wouldn't make sense). – deceze Aug 23 '16 at 15:16

1 Answers1

19

You seem to confuse mutability with rebinding. a += 1 assigns a new object, the int object with the numeric value 1235, to a. Under the hood, for immutable objects like int, a += 1 is just the same as a = a + 1.

The original 1234 object is not mutated. The dictionary is still using an int object with numeric value 1234 as the key. The dictionary still holds a reference to that object, even though a now references a different object. The two references are independent.

Try this instead:

>>> class BadKey:
...     def __init__(self, value):
...         self.value = value
...     def __eq__(self, other):
...         return other == self.value
...     def __hash__(self):
...         return hash(self.value)
...     def __repr__(self):
...         return 'BadKey({!r})'.format(self.value)
...
>>> badkey = BadKey('foo')
>>> d = {badkey: 42}
>>> badkey.value = 'bar'
>>> print(d)
{BadKey('bar'): 42}

Note that I altered the attribute value on the badkey instance. I didn't even touch the dictionary. The dictionary reflects the change; the actual key value itself was mutated, the object that both the name badkey and the dictionary reference.

However, you now can't access that key anymore:

>>> badkey in d
False
>>> BadKey('bar') in d
False
>>> for key in d:
...     print(key, key in d)
...
BadKey('bar') False

I have thoroughly broken my dictionary, because I can no longer reliably locate the key.

That's because BadKey violates the principles of hashability; that the hash value must remain stable. You can only do that if you don't change anything about the object that the hash is based on. And the hash must be based on whatever makes two instances equal.

For lists, the contents make two list objects equal. And you can change those, so you can't produce a stable hash either.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Great explanation, I was confused about keys being a reference, I had assumed it was a stored hash value that was calculated when hashed and stored entirely within the dictionary. Thanks! – Will Aug 23 '16 at 15:33
  • 1
    @Will: well, yes, in that the key is slotted in a hash table in the slot determined by the hash value (hash modulo table size), but because that can lead to *clashes* (multiple keys hashing to the same slot) you also need to test for equality when you try to match a key again (`d[key]`, `key in d`, d.get(key), del d[key], d.pop(key) all need to match the key first). So when you slotted an object and then *their hash changes* they are in the wrong place and you can't find that place again. Another object with the old hash won't test equal anymore. – Martijn Pieters Aug 23 '16 at 15:43
  • I guess you could also just hash the "address" of a list `L` via `id(L)` and use the lists themselves as a key rather than its contents. That way, if the list changes, you can still access your dictionary entry via the list. This may not be desired behaviour in most cases though and I'm not sure how stable `id(L)` is. Maybe someone can elaborate if it isn't stable. – Pharoah Jardin Jul 10 '23 at 08:43
  • @PharoahJardin: `id(L)` is stable as long as the list object has references to it. The point of hashable keys is that you can use a different immutable object with the *same value* and have it still act as the same key. `id(L)` will only work for a single object, and cannot be recreated from the contents of that list object. – Martijn Pieters Jul 15 '23 at 21:57
  • Thank you. I think in most cases where you want to track a list independently of its contents and associate it with some values, you’re probably better off using a wrapper class for both the key and value(s) anyway. – Pharoah Jardin Jul 17 '23 at 05:46