1

As it was elaborated in this post, the command

str(hash(frozenset(kwargs.items())))

yields different results for the same dictionary if you restart the interpreter. Now the obvious question: Since I need the hash for caching, I need a hashing which is the same when the dictionary is the same (Otherwise caching would not make any sense). So to say, how can I get an injective hash for every (not nested) dictionary?

Community
  • 1
  • 1
varantir
  • 6,624
  • 6
  • 36
  • 57
  • Does this post answers your questions : http://stackoverflow.com/questions/1151658/python-hashable-dicts – Azurtree Jun 11 '15 at 14:37
  • No, it yields the same problem as mentioned above. edit: It does not work for me! – varantir Jun 11 '15 at 14:39
  • What problem would that be? Mentioned where? Try to provide more details than "it does not work", such as an example demonstrating. – interjay Jun 11 '15 at 14:39
  • what kind of randomization? Hash says "Two objects with the same value have the same hash value" – Pynchia Jun 11 '15 at 14:40
  • I get different hashes for the same dictionary! As it was said in the commentaries "It might be interesting the hash() function does not produce a stable output. This means that, given the same input, it returns different results with different instances of the same python interpreter. To me, it looks like some sort of seed value is generated every time the interpreter is started. " – varantir Jun 11 '15 at 14:41
  • So you want the hash value to remain the same after restarting the Python interpreter? You really should have mentioned that... – interjay Jun 11 '15 at 14:41
  • ha-ha! Please edit your question and add in that crucial bit of information – Pynchia Jun 11 '15 at 14:42
  • can you use a `collections.OrderedDict` instead of a plain dictionary? – Pynchia Jun 11 '15 at 15:10

2 Answers2

1

This is not only the case for dicts, but also for any kind of string:

❯ python --version
Python 3.10.1
❯ python -c "print(hash('hello'))"
805068502777750074
❯ python -c "print(hash('hello'))"
-8272315863596519132

What you could do instead is using another hashing method like md5. If you are using an object oriented approach, you could overwrite the __hash__ method like follows:

# persistent_hash.py
import hashlib
import operator as op


class MyClass:
    def __init__(self, name: str, content: dict):
        self.name = name
        self.content = content

    def __hash__(self):
        to_be_hashed = "".join(
            str(value) for _, value in sorted(self.__dict__.items(),
                                              key=op.itemgetter(0))
        )
        return int.from_bytes(
            hashlib.md5(to_be_hashed.encode("utf-8")).digest(),
            "big"
        )


if __name__ == "__main__":
    my_class = MyClass(name="awesome", content={"best_number": 42})
    print(hash(my_class))

Sorting __dict__ by key ensures the same hash for all attributes of MyClass, even if new members are inserted to the class in different order. This returns consistent hash values:

❯ python persistent_hash.py
1439132221247659084
❯ python persistent_hash.py
1439132221247659084

==========

Fun fact: python 2.X seems to be consistent in hashing strings:

❯ python2 --version
Python 2.7.18
❯ python2 -c "print(hash('hello'))"
840651671246116861
❯ python2 -c "print(hash('hello'))"
840651671246116861
tschmelz
  • 480
  • 1
  • 4
  • 10
0

A simple

":".join(key for key in sorted(kwargs.items()))

should suffice. Of course this assumes keys are strings and sorting them is stable, if not, you need to invoke str around them (and make sure it's meaningful) and/or provide proper comparison.

deets
  • 6,285
  • 29
  • 28