Python hash() function returns different values for the same input

Question

Yes, you read the title correctly. I'm trying to figure out why the built-in hash() function in Python would return a different digest for the same input?

This is the code that computes the hash:

    # element is an instance of a typing.NamedTuple
    def compute_hash(self, element):
        values = self.get_key_values_tuple(element)
        _hash = hash(values)
        logging.info(f'Hash of {values} is {_hash}')
        return _hash

    # self.keys in this instance is ['session_id', 'time', 'wearable_id']
    def get_key_values_tuple(self, element: tuple) -> tuple:
        return tuple(map(lambda key: getattr(element, key), self.keys))

This code generates these results:

and generates these logs:

Keep in mind that this code works perfectly on other datasets with the same input data types, it also works intermittently on this dataset (i.e. sometimes the hash is the same for the same input triplets, sometimes it's different for the same input triplets).

A bit more context:

Using Python 3.8.

I'm building Apache Beam components that run on GCP Dataflow. This means that the code can be executed on different machines, but the VM/Container in which it's being executed is always the same (e.g. exact same environment).

"This means that the code can be executed on different machines" - `hash` values are not intended to be consistent across different Python processes, let alone on different machines. — user2357112, Oct 13 '21 at 22:15
Really? Do you have documentation on that? I've been looking for a reason. Feel free to answer the question. — Simon Corcos, Oct 13 '21 at 22:16
Is this the same as https://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions ? — Matt Cliff, Oct 13 '21 at 22:18
`hash` is meant for hashing based containers, e.g. dict and set. It isn't guaranteed to be unique across python processes, and indeed, is often purposefully randomized — juanpa.arrivillaga, Oct 13 '21 at 22:24
`hash()` is made to be fast (it is used a lot within python code). Using `hashlib` you can solve all problems (also the DoS vulnerability of hash, but it would slow down too much Python, if it a strong hash function would be used instead of `hash()`. — Giacomo Catenazzi, Oct 14 '21 at 07:49

Python hash() function returns different values for the same input

A bit more context:

0 Answers0