0

An object that is hashable needs a __hash__ method and it has a hash value which never changes during its lifetime.

Python lists are not hashable for reasons that I totally ignore and I wonder if the following implementation is OK or if it has some glitches that I am not aware of.

class L(list):
    def __hash__(self):
        return id(self)

 a = range(10)
 l = L(range(10))
 print a
 >> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
 print l
 >> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
 hash(l)
 >> 41889288
 hash(a) # unsurprisingly it returns an error because a list is not hashable
 >> Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'

 # lets append a value to l
 l.append(1000)
 print l
 >> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1000]
 isinstance(l, list) # to double check whether l is a List instance
 >> True
 D = {}
 D[l] = "this is a dict with a list as key"
 print D
 {[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1000]: 'this is a dict with a list as key'}
 # lets append another value to l
 l.append(-10)
 print D
 >> {[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1000, -10]: 'this is a dict with a list as key'}

 # now lets add a normal list to the dict to the dict
 D[a] = "Add a list as key"
 >>Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   TypeError: unhashable type: 'list'

So this a hashable list implementation that is pretty much working with no issues and I don't see why a list can't be hashable in the normal Python distribution even though it is still mutable.

NB :This question is not about why a list can't be used as a dictionary key Any explanation?

Cobry
  • 4,348
  • 8
  • 33
  • 49
  • 5
    What does it even _mean_ for something to be hashable and mutable? If the list changes and the hash does not, how can you meaningfully use it as a dict key? The whole notion of dicts and hash tables breaks down here. The hash is what is used to distinguish one key from another. If the key changes and the hash does not, what is even happening? – Two-Bit Alchemist May 29 '15 at 19:35
  • 2
    In other words, under your notion, say I create a dict `d`, and assign `d[[1,2,3]] = 'alpha'`. (That is, the key `[1, 2, 3]` refers to `'alpha'`.) Then I append to the list. Is `d[[1, 2, 3, 4]]` still alpha?? What if I make a new list `[1, 2, 3, 4]` which presumably has a different hash then add it to the dictionary? What then? – Two-Bit Alchemist May 29 '15 at 19:37
  • technically you are right, but the key is not changing if you see it as an instance. its value is changing but not the instance itself – Cobry May 29 '15 at 19:38
  • 2
    Googling "python list as dictionary key" led me to [this SO question](http://stackoverflow.com/questions/7257588/why-cant-i-use-a-list-as-a-dict-key-in-python), which led me to [this Python Wiki entry](https://wiki.python.org/moin/DictionaryKeys) with a nice explanation of why a mutable data type like `list` isn't a good choice for a dictionary key. – TigerhawkT3 May 29 '15 at 19:39
  • 3
    What's the *point* of it being hashable? Lists aren't unhashable because it's hard, but because it doesn't make sense. – jonrsharpe May 29 '15 at 19:39
  • @Cobry I'm with you. I imagined instantly using the id for a hash. Now imagine you are a programmer looking at code that obeys the principle I see happening in my second comment. Two identical-looking dict keys returning different values. What's even going on there? What's the benefit? Why would you want that? – Two-Bit Alchemist May 29 '15 at 19:42

2 Answers2

7

You have a __hash__ method. It returns something. Does it return something useful? Let's see.

>>> a = L([1, 2])
>>> b = L([1, 2])
>>> a == b
True
>>> s = set([a, b])
>>> s
set([[1, 2], [1, 2]])

Nope! It breaks the assumptions made by any code that uses hash, specifically that equal objects have equal hashes. Now we have a set with two equal objects in it. You can sort of fix this by making == and != compare by identity, though then you also lose total order and you have to remove <, >, <=, and >=. This is a lot of useful stuff you'd have to take out to make lists meaningfully hashable.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • range(10) == range(10) also return True but range(10) is range(10) returns False the same way L(range(10)) is L(range(10)) returns False I don't see what breaks down here ... – Cobry May 29 '15 at 19:47
  • @Cobry: Yup, but you can't stick those in a set, so you can't get both of them into the same set and break the set API. – user2357112 May 29 '15 at 19:48
  • @user2357112: well, in Python 3 you can. ;-) – DSM May 29 '15 at 19:49
  • 2
    @Cobry: Also, it's not assumed that equal objects have equal IDs. What is assumed is that for any objects `a` and `b`, `a == b` implies `hash(a) == hash(b)`. This property is the entire point of hashes and the entire reason they're useful; your `__hash__` does not have this property. – user2357112 May 29 '15 at 19:58
0

You can - with limitations - use a JSON dump to compare content-wise quality. One limitation is that this only works for any JSON-serializable data.

import json

class HashableList(list):
    def __hash__(self):
        return hash(json.dumps(self, sort_keys=True))

This works reasonable well for most cases. In any case, you need to ensure that this works for the data you are using it for.

sebix
  • 2,943
  • 2
  • 28
  • 43