This is not only the case for dict
s, but also for any kind of string:
❯ python --version
Python 3.10.1
❯ python -c "print(hash('hello'))"
805068502777750074
❯ python -c "print(hash('hello'))"
-8272315863596519132
What you could do instead is using another hashing method like md5
. If you are using an object oriented approach, you could overwrite the __hash__
method like follows:
# persistent_hash.py
import hashlib
import operator as op
class MyClass:
def __init__(self, name: str, content: dict):
self.name = name
self.content = content
def __hash__(self):
to_be_hashed = "".join(
str(value) for _, value in sorted(self.__dict__.items(),
key=op.itemgetter(0))
)
return int.from_bytes(
hashlib.md5(to_be_hashed.encode("utf-8")).digest(),
"big"
)
if __name__ == "__main__":
my_class = MyClass(name="awesome", content={"best_number": 42})
print(hash(my_class))
Sorting __dict__
by key ensures the same hash for all attributes of MyClass
, even if new members are inserted to the class in different order.
This returns consistent hash values:
❯ python persistent_hash.py
1439132221247659084
❯ python persistent_hash.py
1439132221247659084
==========
Fun fact: python 2.X seems to be consistent in hashing strings:
❯ python2 --version
Python 2.7.18
❯ python2 -c "print(hash('hello'))"
840651671246116861
❯ python2 -c "print(hash('hello'))"
840651671246116861