29

Following on from this question, I'm interested to know when is a python object's hash computed?

  1. At an instance's __init__ time,
  2. The first time __hash__() is called,
  3. Every time __hash__() is called, or
  4. Any other opportunity I might be missing?

May this vary depending on the type of the object?

Why does hash(-1) == -2 whilst other integers are equal to their hash?

Community
  • 1
  • 1
wim
  • 338,267
  • 99
  • 616
  • 750
  • 3 is out as I've [read](http://www.laurentluce.com/posts/python-dictionary-implementation/) it is cached the first time it is called. I would assume the 2nd option is right but as I'm not sure I won't post it as an answer :) – rplnt Oct 04 '11 at 12:51
  • @rplnt: wrong; that's merely when talking of a dictionary. Its hash will be stored in the dictionary, but that's not true of general hashing. – Chris Morgan Oct 07 '11 at 00:37
  • @ChrisMorgan Actually I don't think python `dict` caches hash values for its keys. Of course, individual classes may do whatever they like in their `__hash__` function, so the article referenced above said that `str` caches its hash values. – max Apr 13 '15 at 08:44
  • @max: `dict` is a hash table and must by definition store the hashes of its keys. That is what I refer to. What I said is entirely true. – Chris Morgan Apr 13 '15 at 09:39
  • @ChrisMorgan of course, agreed. Originally, I thought you were saying that `dict` stores the mapping from keys to their hashes, thus avoiding the necessity of calling `hash(k)` repeatedly -- that doesn't happen AFAIK. – max Apr 13 '15 at 09:58
  • Related: [Is there any reason *not* to cache an object's hash?](http://stackoverflow.com/q/3787405/12892) – Cristian Ciupitu Jan 26 '16 at 18:31

3 Answers3

27

The hash is generally computed each time it's used, as you can quite easily check yourself (see below). Of course, any particular object is free to cache its hash. For example, CPython strings do this, but tuples don't (see e.g. this rejected bug report for reasons).

The hash value -1 signals an error in CPython. This is because C doesn't have exceptions, so it needs to use the return value. When a Python object's __hash__ returns -1, CPython will actually silently change it to -2.

See for yourself:

class HashTest(object):
    def __hash__(self):
        print('Yes! __hash__ was called!')
        return -1

hash_test = HashTest()

# All of these will print out 'Yes! __hash__ was called!':

print('__hash__ call #1')
hash_test.__hash__()

print('__hash__ call #2')
hash_test.__hash__()

print('hash call #1')
hash(hash_test)

print('hash call #2')
hash(hash_test)

print('Dict creation')
dct = {hash_test: 0}

print('Dict get')
dct[hash_test]

print('Dict set')
dct[hash_test] = 0

print('__hash__ return value:')
print(hash_test.__hash__())  # prints -1
print('Actual hash value:')
print(hash(hash_test))  # prints -2
wim
  • 338,267
  • 99
  • 616
  • 750
Petr Viktorin
  • 65,510
  • 9
  • 81
  • 81
7

From here:

The hash value -1 is reserved (it’s used to flag errors in the C implementation). If the hash algorithm generates this value, we simply use -2 instead.

As integer's hash is integer itself it's just changed right away.

rplnt
  • 2,341
  • 16
  • 14
1

It is easy to see that option #3 holds for user defined objects. This allows the hash to vary if you mutate the object, but if you ever use the object as a dictionary key you must be sure to prevent the hash ever changing.

>>> class C:
    def __hash__(self):
        print("__hash__ called")
        return id(self)


>>> inst = C()
>>> hash(inst)
__hash__ called
43795408
>>> hash(inst)
__hash__ called
43795408
>>> d = { inst: 42 }
__hash__ called
>>> d[inst]
__hash__ called

Strings use option #2: they calculate the hash value once and cache the result. This is safe because strings are immutable so the hash can never change, but if you subclass str the result might not be immutable so the __hash__ method will be called every time again. Tuples are usually thought of as immutable so you might think the hash could be cached, but in fact a tuple's hash depends on the hash of its content and that might include mutable values.

For @max who doesn't believe that subclasses of str can modify the hash:

>>> class C(str):
    def __init__(self, s):
        self._n = 1
    def __hash__(self):
        return str.__hash__(self) + self._n


>>> x = C('hello')
>>> hash(x)
-717693723
>>> x._n = 2
>>> hash(x)
-717693722
Duncan
  • 92,073
  • 11
  • 122
  • 156
  • If you pass a tuple that contains mutable values as an argument to the built-in hash function, the TypeError exception will be raised. So this is not the reason that tuples don't cache their hash values. The link at the beginning of [@PetrViktorin's answer above](http://stackoverflow.com/a/7648538/336527) provides the explanation. See also [Guido's comments](http://mail.python.org/pipermail/python-dev/2003-August/037424.html). Also, are you sure hash is not cached for str subclasses? It seems to return the same value as str.hash, which is automatically cached. – max Apr 13 '15 at 09:09
  • @max, I added an example for you showing that the hash of a `str` subclass is not cached. – Duncan Apr 13 '15 at 17:24
  • ah yes, right.. I guess I was thinking if you don't define `__hash__`, it's cached but then it's obvious since it just uses `str.__hash__` in that case. – max Apr 13 '15 at 19:34