A common issue on SO is removing duplicates from a list of lists. Since lists are unhashable, set([[1, 2], [3, 4], [1, 2]])
throws TypeError: unhashable type: 'list'
. Answers to this kind of question usually involve using tuples, which are immutable and therefore hashable.
This answer to What makes lists unhashable? include the following:
If the hash value changes after it gets stored at a particular slot in the dictionary, it will lead to an inconsistent dictionary. For example, initially the list would have gotten stored at location A, which was determined based on the hash value. If the hash value changes, and if we look for the list we might not find it at location A, or as per the new hash value, we might find some other object.
but I don't quite understand because other types that can be used for dictionary keys can be changed without issue:
>>> d = {}
>>> a = 1234
>>> d[a] = 'foo'
>>> a += 1
>>> d[a] = 'bar'
>>> d
{1234: 'foo', 1235: 'bar'}
It is obvious that if the value of a
changes, it will hash to a different location in the dictionary. Why is the same assumption dangerous for a list? Why is the following an unsafe method for hashing a list, since it is what we all use when we need to anyway?
>>> class my_list(list):
... def __hash__(self):
... return tuple(self).__hash__()
...
>>> a = my_list([1, 2])
>>> b = my_list([3, 4])
>>> c = my_list([1, 2])
>>> foo = [a, b, c]
>>> foo
[[1, 2], [3, 4], [1, 2]]
>>> set(foo)
set([[1, 2], [3, 4]])
It seems that this solves the set()
problem, why is this an issue? Lists may be mutable, but they are ordered which seems like it would be all that's needed for hashing.