Note this interesting behavior of the hash()
builtin:
>>> [x for x in range(-2**31, 2**31) if x != hash(x)]
[-1]
>>> hash(-1)
-2
This is on Python 3.6.2 (it also seems the same behaviors occurs on 2.7.10). The only number in [-2**31, 2**31)
that doesn't hash to itself is -1
. Why is this?
I've dug into the Python source to see if I could uncover the reason, but came up with a few places which suggest that hash(-1) = -1
should hold:
https://github.com/python/cpython/blob/3.6/Python/pyhash.c#L34-L83:
/* For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types.
A quick summary of the hashing strategy:
(1) First define the 'reduction of x modulo P' for any rational
number x; this is a standard extension of the usual notion of
reduction modulo P for integers. If x == p/q (written in lowest
terms), the reduction is interpreted as the reduction of p times
the inverse of the reduction of q, all modulo P; if q is exactly
divisible by P then define the reduction to be infinity. So we've
got a well-defined map
reduce : { rational numbers } -> { 0, 1, 2, ..., P-1, infinity }.
(2) Now for a rational number x, define hash(x) by:
reduce(x) if x >= 0
-reduce(-x) if x < 0
*/
Note above that for negative numbers hash(x) = -reduce(-x)
.
It seems like on x86_64 the hashes are uint64s and the above invariant holds for approximately the range [-2**62, 2**62)
. Why is -1
special in that it (and -2
, technically) are the only integers to collide in this range?