1

Context: building a consistent hashing algorithm.

The official documentation for Python's hash() function states:

Return the hash value of the object (if it has one). Hash values are integers.

However, it does not explicitly state whether the function maps to an integer range (with a minimum and a maximum) or not.

Coming from other languages where values for primitive types are bounded (e.g. C#'s/Java's Int.MaxValue), I know that Python's likes to think in "unbounded" terms – i.e. switching from int to long in the background.

Am I to assume that the hash() function also is unbounded? Or is it bounded, for example mapping to what Python assigns to the max/min values of the "int-proper" – i.e. between -2147483648 through 2147483647?

alelom
  • 2,130
  • 3
  • 26
  • 38
  • 1
    You've read the note at https://docs.python.org/3/reference/datamodel.html#object.__hash__? – deceze Mar 08 '21 at 08:58
  • " switching from int to long in the background." that distinction doesn't exist anymore in Python 3, it's `int` across the entire range and the switch from "short int" to "long int" to "infinite" is not exposed. – Masklinn Mar 08 '21 at 09:07

2 Answers2

6

As others pointed out, there is a misplaced[1] Note in the documentation that reads:

hash() truncates the value returned from an object’s custom hash() method to the size of a Py_ssize_t.

To answer the question, we need to get this Py_ssize_t. After some research, it seems that it is stored in sys.maxsize, although I'd appreciate some feedback here.

The solution that I adopted eventually was then:

import sys
bits = sys.hash_info.width              # in my case, 64
print (sys.maxsize)                     # in my case, 9223372036854775807

# Therefore:
hash_maxValue = int((2**bits)/2) - 1    # 9223372036854775807, or +sys.maxsize
hash_minValue = -hash_maxValue          # -9223372036854775807, or -sys.maxsize

Happy to receive comments/feedbacks on this – until proven wrong, this is the accepted answer.


[1] The note is included in the section dedicated to __hash__() instead of the one dedicated to hash().

alelom
  • 2,130
  • 3
  • 26
  • 38
1

From the documentation

hash() truncates the value returned from an object’s custom __hash__() method to the size of a Py_ssize_t. This is typically 8 bytes on 64-bit builds and 4 bytes on 32-bit builds. If an object’s __hash__() must interoperate on builds of different bit sizes, be sure to check the width on all supported builds. An easy way to do this is with python -c "import sys; print(sys.hash_info.width)".

More details can be found here https://docs.python.org/3/reference/datamodel.html#object.__hash__

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
penguin2048
  • 1,303
  • 13
  • 25