1

I'm developing a program using Python 3.6 I have a problem: if I use the deterministic hash function (from standard library of the language) on the same object, the string that results in output (after a run), is different for some runs! For example:

class Generic:
    def __init__(self, id, name, property):
        self.id = id 
        self.name = name
        self.property = property


def main():
    my_object = Generic(3,'ddkdjsdk','casualstring')    
    print(hash(my_object))

I would like the output to always be the same (deterministic), but unfortunately different strings appear on the console: 8765256330262, -9223363264515786864, -9223363262437648366 and others... Why this happens? I would like to guarantee the determinism with this function throughout my application! How do I solve the problem?

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
claudioz
  • 1,121
  • 4
  • 14
  • 25
  • 5
    The default `hash` is the object's memory address and in no way related to those three properties. Implement `__hash__` if you want a specific, deterministic hash based on the object's actual properties. – tobias_k Mar 11 '19 at 12:41
  • 2
    Possible duplicate of [What does hash do in python?](https://stackoverflow.com/questions/17585730/what-does-hash-do-in-python) "Note that the hash of a value only needs to be the same for one run of Python. In Python 3.3 they will in fact change for every new run of Python" – Andreas Fester Mar 11 '19 at 12:42
  • Here's a good thread on writing a hash method: [How to implement a good __hash__ function in python](https://stackoverflow.com/questions/4005318/how-to-implement-a-good-hash-function-in-python) – Mark Mar 11 '19 at 12:43
  • This is intentional. See https://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions. – chepner Mar 11 '19 at 12:51

2 Answers2

0

In this case it's probably easiest to define your own __eq__ function and __hash__ function. This will return the same hash every time for you:

class Generic:
    def __init__(self, id, name, property):
        self.id=id
        self.name = name
        self.property = property

    def __eq__(self, other):
        assert self.__class__ == other.__class__, "Types do not match"
        return self.id == other.id and self.name == other.name and self.property == other.property

    def __hash__(self):
        return hash ( (self.id, self.name, self.property) )

This will also make hashes of equivalent objects equal, as well:

>>>obj = Generic(1, 'blah', 'blah')
>>>obj2 = Generic(1, 'blah', 'blah')
>>>obj == obj2
True
>>>hash(obj) == hash(obj2)
True

hope that helps!

Luke
  • 26
  • 6
0

For those looking to get hashes of built-in types, Python's built in hashlib might be easier than subclassing to redefine __hash__. Here's an example with for string.

from hashlib import md5

def string_hash(string):
    return md5(string.encode()).hexdigest()

This will return the same hash for different string objects so long as the content is the same. Not all objects will work, but it could you save you time depending on your use case.