53

In How to hash lists? I was told that I should convert to a tuple first, e.g. [1,2,3,4,5] to (1,2,3,4,5).

So the first cannot be hashed, but the second can. Why*?


*I am not really looking for a detailed technical explanation, but rather for an intuition

Mazdak
  • 105,000
  • 18
  • 159
  • 188
gsamaras
  • 71,951
  • 46
  • 188
  • 305

6 Answers6

70

Mainly, because tuples are immutable. Assume the following works:

>>> l = [1, 2, 3]
>>> t = (1, 2, 3)
>>> x = {l: 'a list', t: 'a tuple'}

Now, what happens when you do l.append(4)? You've modified the key in your dictionary! From afar! If you're familiar with how hashing algorithms work, this should frighten you. Tuples, on the other hand, are absolutely immutable. t += (1,) might look like it's modifying the tuple, but really it's not: it simply creating a new tuple, leaving your dictionary key unchanged.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
val
  • 8,459
  • 30
  • 34
  • val, great explanation! What are you trying to say here: From afar ! ? Do you think the question was so bad to get a downvote? – gsamaras May 10 '16 at 11:17
  • I mean that you've modified the key of your dictionary from _outside_ the dictionary: since hashtables rely on the 1:1(ish) correspondance of keys and hashes, modifying the key behind the hash's back is a very bad idea indeed. – val May 10 '16 at 11:19
  • 4
    You've not really said why modifying a key is bad -- because it changes the hash value of the key, meaning the place where the key/value pair is stored becomes invalid, meaning you can't retrieve the key/value pair any more. Also, hashtables will work with a ∞:1 key to hash correspondence (all keys having the same hash value). All that is effected is their performance. – Dunes May 10 '16 at 11:30
  • 2
    @Dunes can you expand on that? – gsamaras May 10 '16 at 11:35
11

You could totally make that work, but I bet you wouldn't like the effects.

from functools import reduce
from operator import xor

class List(list):
    def __hash__(self):
        return reduce(xor, self)

Now let's see what happens:

>>> l = List([23,42,99])
>>> hash(l)
94
>>> d = {l: "Hello"}
>>> d[l]
'Hello'
>>> l.append(7)
>>> d
{[23, 42, 99, 7]: 'Hello'}
>>> l
[23, 42, 99, 7]
>>> d[l]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: [23, 42, 99, 7]

edit: So I thought about this some more. You could make the above example work, if you return the list's id as its hash value:

class List(list):
    def __hash__(self):
        return id(self)

In that case, d[l] will give you 'Hello', but neither d[[23,42,99,7]] nor d[List([23,42,99,7])] will (because you're creating a new [Ll]ist.

L3viathan
  • 26,748
  • 2
  • 58
  • 81
9

Since a list is mutable, if you modify it you would modify its hash too, which ruins the point of having a hash (like in a set or a dict key).

Edit: I'm surprised this answer regularly get new upvotes, it was really quickly written. I feel I need to make it better now.

So the set and the dict native data structures are implemented with a hashmap. Data types in Python may have a magic method __hash__() that will be used in hashmap construction and lookups.

Only immutable data types (int, string, tuple, ...) have this method, and the hash value is based on the data and not the identity of the object. You can check this by

>>> a = (0,1)
>>> b = (0,1)
>>> a is b
False # Different objects
>>> hash(a) == hash(b)
True # Same hash

If we follow this logic, mutating the data would mutate the hash, but then what's the point of a changing hash ? It defeats the whole purpose of sets and dicts or other hashes usages.

Fun fact : if you try the example with strings or ints -5 <= i <= 256, a is b returns True because of micro-optimizations (in CPython at least).

polku
  • 1,575
  • 2
  • 14
  • 11
  • can you provide more legal citation and content on your sayings? – Vicrobot Aug 14 '18 at 20:33
  • 1
    better late than never I guess : "high performance python" o'reilly book, there is a description of builtin data structures implementation. – polku Jan 07 '20 at 10:46
6

Because lists are mutable and tuples aren't.

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
2

The answers are good. The reason is the mutability. If we could use list in dicts as keys; (or any mutable object) then we would be able to change the key by mutating that key (either accidentally or intentionally). This would cause change in the hash value of the key in dictionary due to which we would not be able to retrace the value from that data structure by that key. Hash values and Hash tables are used to map the large data with ease by mapping them to indices which stores the real value entries.

Read more about them here:-

Hash Tables & Hash Functions & Assosiative Arrays

Vicrobot
  • 3,795
  • 1
  • 17
  • 31
1

Not every tuple is hashable.For example tuple contains list as an element.

x = (1,[2,3])
print(type(x))
print(hash(x))