Why does Python's hash of infinity have the digits of π?

Question

The hash of infinity in Python has digits matching pi:

>>> inf = float('inf')
>>> hash(inf)
314159
>>> int(math.pi*1e5)
314159

Is that just a coincidence or is it intentional?

Not certain, but my guess would be that it's as deliberate as `hash(float('nan'))` being `0`. — cs95, May 20 '19 at 20:04
Hmm, no mention about that in [`sys.hash_info`](https://docs.python.org/3/library/sys.html#sys.hash_info). Easter egg? — wim, May 20 '19 at 20:09
Seems like someone was in a frivolous mood one day. But -- why not? — John Coleman, May 20 '19 at 20:11
Ask Tim Peters. Here's the commit where he introduced this constant, 19 years ago: https://github.com/python/cpython/commit/39dce29365d287dc6b353b2a527dc11fe58dcfa6. I kept those special values when I reworked the numeric hash in https://bugs.python.org/issue8188 — Mark Dickinson, May 20 '19 at 20:38
@MarkDickinson Thanks. It looks like Tim may have also used the digits of [e](https://en.wikipedia.org/wiki/E_(mathematical_constant)) for hash of -inf originally. — wim, May 20 '19 at 20:42
@wim Ah yes, true. And apparently I changed that to `-314159`. I'd forgotten about that. — Mark Dickinson, May 20 '19 at 20:44

ShreevatsaR · Answer 1 · 2019-05-23T18:24:46.110

Summary: It's not a coincidence; _PyHASH_INF is hardcoded as 314159 in the default CPython implementation of Python, and was picked as an arbitrary value (obviously from the digits of π) by Tim Peters in 2000.

The value of hash(float('inf')) is one of the system-dependent parameters of the built-in hash function for numeric types, and is also available as sys.hash_info.inf in Python 3:

>>> import sys
>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>> sys.hash_info.inf
314159

(Same results with PyPy too.)

In terms of code, hash is a built-in function. Calling it on a Python float object invokes the function whose pointer is given by the tp_hash attribute of the built-in float type (PyTypeObject PyFloat_Type), which is the float_hash function, defined as return _Py_HashDouble(v->ob_fval), which in turn has

    if (Py_IS_INFINITY(v))
        return v > 0 ? _PyHASH_INF : -_PyHASH_INF;

where _PyHASH_INF is defined as 314159:

#define _PyHASH_INF 314159

In terms of history, the first mention of 314159 in this context in the Python code (you can find this with git bisect or git log -S 314159 -p) was added by Tim Peters in August 2000, in what is now commit 39dce293 in the cpython git repository.

The commit message says:

Fix for http://sourceforge.net/bugs/?func=detailbug&bug_id=111866&group_id=5470. This was a misleading bug -- the true "bug" was that hash(x) gave an error return when x is an infinity. Fixed that. Added new Py_IS_INFINITY macro to pyport.h. Rearranged code to reduce growing duplication in hashing of float and complex numbers, pushing Trent's earlier stab at that to a logical conclusion. Fixed exceedingly rare bug where hashing of floats could return -1 even if there wasn't an error (didn't waste time trying to construct a test case, it was simply obvious from the code that it could happen). Improved complex hash so that hash(complex(x, y)) doesn't systematically equal hash(complex(y, x)) anymore.

In particular, in this commit he ripped out the code of static long float_hash(PyFloatObject *v) in Objects/floatobject.c and made it just return _Py_HashDouble(v->ob_fval);, and in the definition of long _Py_HashDouble(double v) in Objects/object.c he added the lines:

        if (Py_IS_INFINITY(intpart))
            /* can't convert to long int -- arbitrary */
            v = v < 0 ? -271828.0 : 314159.0;

So as mentioned, it was an arbitrary choice. Note that 271828 is formed from the first few decimal digits of e.

Related later commits:

By Mark Dickinson in Apr 2010 (also), making the Decimal type behave similarly
By Mark Dickinson in Apr 2010 (also), moving this check to the top and adding test cases
By Mark Dickinson in May 2010 as issue 8188, completely rewriting the hash function to its current implementation, but retaining this special case, giving the constant a name _PyHASH_INF (also removing the 271828 which is why in Python 3 hash(float('-inf')) returns -314159 rather than -271828 as it does in Python 2)
By Raymond Hettinger in Jan 2011, adding an explicit example in the "What's new" for Python 3.2 of sys.hash_info showing the above value. (See here.)
By Stefan Krah in Mar 2012 modifying the Decimal module but keeping this hash.
By Christian Heimes in Nov 2013, moved the definition of _PyHASH_INF from Include/pyport.h to Include/pyhash.h where it now lives.

The choice of -271828 for -Inf eliminates any doubt that the pi association was accidental. — Russell Borogove, May 21 '19 at 04:30
@RussellBorogove No but it makes it about one million times less likely ;) — pipe, May 21 '19 at 15:01
@cmaster: See the part above where it says May 2010, namely the documentation section on [hashing of numeric types](https://docs.python.org/3/library/stdtypes.html#hashing-of-numeric-types) and [issue 8188](https://bugs.python.org/issue8188) — the idea is that we want `hash(42.0)` to be the same as `hash(42)`, also the same as `hash(Decimal(42))` and `hash(complex(42))` and `hash(Fraction(42, 1))`. The solution (by Mark Dickinson) is an elegant one IMO: defining a mathematical function that works for any rational number, and using the fact that floating-point numbers are rational numbers too. — ShreevatsaR, May 22 '19 at 13:22
@ShreevatsaR Ah, thank you. While I would not have cared to guarantee these equalities, it's good to know that there is a good, solid, and logical explanation for the seemingly complex code :-) — cmaster - reinstate monica, May 22 '19 at 14:00
@cmaster Actually, that's a necessity. Hashable objects which compare equal must have the same hash value. — wim, May 22 '19 at 22:51
@cmaster The hash function for integers is simply `hash(n) = n % M` where M = (2^61 - 1). This is generalized for rational n to `hash(p/q) = (p/q) mod M` with the division being interpreted modulo M (in other words: `hash(p/q) = (p * inverse(q, M)) % M`). The reason we want this: if into a dict `d` we put `d[x] = foo` and then we have `x==y` (e.g. 42.0==42) but `d[y]` is not the same as `d[x]`, then we'd have a problem. Most of the seemingly complex code comes from the nature of the floating-point format itself, to recover the fraction properly and needing special-cases for inf and NaN values. — ShreevatsaR, May 22 '19 at 23:22

score 49 · Accepted Answer · edited Mar 15 '21 at 22:45

49

_PyHASH_INF is defined as a constant equal to 314159.

I can't find any discussion about this, or comments giving a reason. I think it was chosen more or less arbitrarily. I imagine that as long as they don't use the same meaningful value for other hashes, it shouldn't matter.

edited Mar 15 '21 at 22:45

Community

1
1

answered May 20 '19 at 20:19

Patrick Haugh

59,226
13
88
96

8

Small nitpick: it is almost inevitable by definition that the same value will be used for other hashes, e.g. in this case `hash(314159)` is also `314159`. Also try, in Python 3, `hash(2305843009214008110) == 314159` (this input is `314159 + sys.hash_info.modulus`) etc. – ShreevatsaR May 21 '19 at 11:43
3

@ShreevatsaR I just meant that as long as they don't choose this value to be the hash of other values by definition, then choosing a meaningful value like this doesn't increase the chance of hash collisions – Patrick Haugh May 21 '19 at 13:37

Alec · Answer 3 · 2019-05-24T02:04:09.043

13

Indeed,

sys.hash_info.inf

returns 314159. The value is not generated, it's built into the source code. In fact,

hash(float('-inf'))

returns -271828, or approximately -e, in python 2 (it's -314159 now).

The fact that the two most famous irrational numbers of all time are used as the hash values makes it very unlikely to be a coincidence.

edited May 24 '19 at 02:04

answered May 21 '19 at 16:39

Alec

8,529
8
37
63

Why does Python's hash of infinity have the digits of π?

3 Answers3