If you need to use a collection of strings as a dictionary key, you have 2 obvious choices: if the order matters, use tuple
:
>>> d[tuple(['foo', 'bar'])] = 'baz'
>>> d['foo', 'bar']
baz
>>> d['bar', 'foo']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: ('bar', 'foo')
If the order must not matter, use frozenset
:
>>> d[frozenset(['foo', 'bar'])] = 'baz'
>>> d[frozenset(['bar', 'foo'])]
'baz'
>>> d[frozenset(['bar', 'foo', 'bar'])]
'baz'
If the count matters but ordering does not, use sorted
with a tuple
:
>>> d[tuple(sorted(['foo', 'bar']))] = 'baz'
>>> d[tuple(sorted(['bar', 'foo']))]
'baz'
>>> d[tuple(sorted(['bar', 'foo', 'bar']))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: ('bar', 'bar', 'foo')
Unlike with Perl hashes, or JavaScript object properties, you do not need to stringify dictionary keys in Python.
Now, concerning the mutable list
not being hashable: The Python dictionary implementation uses the hashtable structure. It specifically requires and assumes of the key that:
- the key implements the
__hash__
method that returns an integer
- that the integers should be as widely spread in the output range, and uniformly mapped, as possible.
- that
__hash__
method returns an unchanging number for the same object
a == b
implies that a.__hash__() == b.__hash__()
A list is not usable as a dictionary key, as:
>>> [].__hash__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not callable
The list
class cannot provide a __hash__
method that could satisfy all the requirements simultaneously a == b
needs to imply that a.__hash__() == b.__hash__()
.
(It *could provide an implementation that returns 0 for each list, and it would work correctly then, but it would refute the use of hashing altogether, as all lists would be mapped to the same slot in the dictionary, as the hash codes would break the rule 2).
It is also not possible to create a __hash__
method for a list:
>>> [].__hash__ = lambda x: 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list' object attribute '__hash__' is read-only
Of course we can always see what would happen if list
would have the __hash__
method - we create a subclass of list and provide __hash__
there; the obvious implementation for the hash code would be that of the tuple()
:
>>> class hashablelist(list):
... def __hash__(self):
... return hash(tuple(self))
...
>>> x = hashablelist(['a', 'b', 'c'])
>>> y = hashablelist(['a', 'b', 'd'])
>>> d = {}
>>> d[x] = 'foo'
>>> d[y] = 'bar'
>>> d.items()
[(['a', 'b', 'c'], 'foo'), (['a', 'b', 'd'], 'bar')]
>>> y[2] = 'c'
>>> d.items()
[(['a', 'b', 'c'], 'foo'), (['a', 'b', 'c'], 'bar')]
>>> del d[x]
>>> del d[y]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: ['a', 'b', 'c']
>>> d.items()
[(['a', 'b', 'c'], 'bar')]
>>> x in d
False
>>> y in d
False
>>> x in d.keys()
True
>>> y in d.keys()
True
The code shows that we just managed to get a broken dictionary as a result - there is no way of accessing or removing the ['a', 'b', 'c'] -> 'bar'
pair directly by the key, even though it is visible in .keys()
, .values()
and .items()
.