Yes, you read the title correctly. I'm trying to figure out why the built-in hash()
function in Python would return a different digest for the same input?
This is the code that computes the hash:
# element is an instance of a typing.NamedTuple
def compute_hash(self, element):
values = self.get_key_values_tuple(element)
_hash = hash(values)
logging.info(f'Hash of {values} is {_hash}')
return _hash
# self.keys in this instance is ['session_id', 'time', 'wearable_id']
def get_key_values_tuple(self, element: tuple) -> tuple:
return tuple(map(lambda key: getattr(element, key), self.keys))
This code generates these results:
Keep in mind that this code works perfectly on other datasets with the same input data types, it also works intermittently on this dataset (i.e. sometimes the hash is the same for the same input triplets, sometimes it's different for the same input triplets).
A bit more context:
Using Python 3.8.
I'm building Apache Beam components that run on GCP Dataflow. This means that the code can be executed on different machines, but the VM/Container in which it's being executed is always the same (e.g. exact same environment).