-4

Basically I "understand" the ambiguity with using other containers as keys. - do you compare the reference or do you compare the values in it (and how deep).

Well can't you just specialize a list and make a custom comparison/hash operator? As I know in my application I wish to compare the list-of-strings by the value of the strings (and relative ordering of course).

So how would I go about writing a custom hash for these kind of lists? Or in another way - how would I stringify the list without leading to the ambiguity that you get from introducing delimiters (that delimiter might be part of the string)?!

Regarding this question: https://wiki.python.org/moin/DictionaryKeys
There it is said directly that lists can't be used;

That said, the simple answer to why lists cannot be used as dictionary keys is that lists do not provide a valid hash method. Of course, the obvious question is, "Why not?"

So I am writing this to question if there's a way to make lists hashable; and how I would make a satisfactory Hash method.


As an example of why I would want this, see the code below:

namelist = sorted(["Jan", "Frank", "Kim"])
commonTrait = newTraits()
traitdict = {}
traitdict[namelist] = commonTrait;

//later I wish to retrieve it:
trait = traitdict[sorted(["Jan", "Frank", "Kim"])]

In this direct use I have in mind the "ordering of the list" doesn't really matter (sorting is just done in above code to make the lists always equal if they hold the same content).

Blorgbeard
  • 101,031
  • 48
  • 228
  • 272
paul23
  • 8,799
  • 12
  • 66
  • 149
  • I think a code sample would help clarify what you need. – Lev Levitsky Jul 02 '14 at 09:51
  • 2
    You can use (nested) tuples as keys *just fine*. It is *mutablility* that is the problem. A key is also still just a reference; if you can use another reference to the same object to modify the contents, then suddenly it becomes impossible to find the same key again in the hash table structure. Mutation *breaks a hash table*. – Martijn Pieters Jul 02 '14 at 09:51
  • As it stands, it's unclear exactly what you are trying to do, what you've tried so far and what you are having trouble with. – RemcoGerlich Jul 02 '14 at 09:55
  • These kind of questions are for me deciding which language to go.. Isn't it kind of silly to first create something and then notice the language doesn't have the constructs you wish? I'm evaluating it first instead - how do you do common tasks. @MartijnPieters Hmm so python simply doesn't copy the key? - but would two list whose contents are the same but constructed separately point to the same value? – paul23 Jul 02 '14 at 10:13
  • @paul23: no, the key is not simply copied; Python stores objects on a heap, everything in the language is then references to these objects. Copying a key for every operation would be prohibitively expensive, so if you want to use a mutable structure as key you'll have to explicitly make a immutable copy of it yourself. You are free to create a custom class to represent the key, and provided it has a `__eq__` and `__hash__` method that'll work too. You are then responsible for not mutating this object yourself. – Martijn Pieters Jul 02 '14 at 10:29
  • _"Programming is done on paper, not on a pc."_ Nooo, nopety no. Programming is problem solving. You write some code, find out it doesn't work, and modify it. Repeat 'till it works. There is no paper involved. – Cerbrus Jul 02 '14 at 11:15
  • 3
    This seems like an XY problem. What do you want to achieve by using a list of strings as a dictionary key, and are there any further restrictions on the contents of the lists? – l4mpi Jul 02 '14 at 11:31
  • @l4mpi Well simply as in the example, I have a group of items, which are grouped because they have a common trait. Now later on when I have a set(!) of items I simply wish to test (and get) if that list of items has a common trait/common function. – paul23 Jul 02 '14 at 13:00
  • Then you should probably just save the traits individually per item; otherwise, how would you deal with subsets et cetera? Or, if you never _have_ to deal with subsets and your collections are practically immutable, just create an immutable representation of each group and use that as the key. – l4mpi Jul 02 '14 at 13:18

1 Answers1

10

If you need to use a collection of strings as a dictionary key, you have 2 obvious choices: if the order matters, use tuple:

>>> d[tuple(['foo', 'bar'])] = 'baz'
>>> d['foo', 'bar']
baz
>>> d['bar', 'foo']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: ('bar', 'foo')

If the order must not matter, use frozenset:

>>> d[frozenset(['foo', 'bar'])] = 'baz'
>>> d[frozenset(['bar', 'foo'])]
'baz'
>>> d[frozenset(['bar', 'foo', 'bar'])]
'baz'

If the count matters but ordering does not, use sorted with a tuple:

>>> d[tuple(sorted(['foo', 'bar']))] = 'baz'
>>> d[tuple(sorted(['bar', 'foo']))]
'baz'
>>> d[tuple(sorted(['bar', 'foo', 'bar']))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: ('bar', 'bar', 'foo')

Unlike with Perl hashes, or JavaScript object properties, you do not need to stringify dictionary keys in Python.


Now, concerning the mutable list not being hashable: The Python dictionary implementation uses the hashtable structure. It specifically requires and assumes of the key that:

  1. the key implements the __hash__ method that returns an integer
  2. that the integers should be as widely spread in the output range, and uniformly mapped, as possible.
  3. that __hash__ method returns an unchanging number for the same object
  4. a == b implies that a.__hash__() == b.__hash__()

A list is not usable as a dictionary key, as:

>>> [].__hash__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not callable

The list class cannot provide a __hash__ method that could satisfy all the requirements simultaneously a == b needs to imply that a.__hash__() == b.__hash__().

(It *could provide an implementation that returns 0 for each list, and it would work correctly then, but it would refute the use of hashing altogether, as all lists would be mapped to the same slot in the dictionary, as the hash codes would break the rule 2).

It is also not possible to create a __hash__ method for a list:

>>> [].__hash__ = lambda x: 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'list' object attribute '__hash__' is read-only

Of course we can always see what would happen if list would have the __hash__ method - we create a subclass of list and provide __hash__ there; the obvious implementation for the hash code would be that of the tuple():

>>> class hashablelist(list):
...     def __hash__(self):
...         return hash(tuple(self))
... 
>>> x = hashablelist(['a', 'b', 'c'])
>>> y = hashablelist(['a', 'b', 'd'])
>>> d = {}
>>> d[x] = 'foo'
>>> d[y] = 'bar'
>>> d.items()
[(['a', 'b', 'c'], 'foo'), (['a', 'b', 'd'], 'bar')]
>>> y[2] = 'c'
>>> d.items()
[(['a', 'b', 'c'], 'foo'), (['a', 'b', 'c'], 'bar')]
>>> del d[x]
>>> del d[y]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: ['a', 'b', 'c']
>>> d.items()
[(['a', 'b', 'c'], 'bar')]
>>> x in d
False
>>> y in d
False
>>> x in d.keys()
True
>>> y in d.keys()
True

The code shows that we just managed to get a broken dictionary as a result - there is no way of accessing or removing the ['a', 'b', 'c'] -> 'bar' pair directly by the key, even though it is visible in .keys(), .values() and .items().