I have a problem where I generate randomly a dictionary, with a possibly high number of possibilities (say, I have 25'000 possibly different dics). I want to generate an identifier, an ID, for every one of these possibilities. What I want is:
- If two dictionaries have exactly the same values for each key, then the ID is the same
- If two dictionaries have a different ID, then they must have at least one difference in their content.
- The ID stays the same everytime I run the program (
id(x)
does not work ) - Bonus: the ID stays the same for different versions of Python (2.6, 2.7, 3.4, 3.6)
My current idea is to use hash functions (although I understand little about it) and do something like this (suppose a dictionary of int/float numbers):
import hashlib
def getID(mydic):
ID = 0
for x in mydic.keys():
# Hash the content
ID = ID + int(hashlib.sha256(str(mydic[x]).encode('utf-8')).hexdigest(), 16)
# Hash the key
ID = ID + int(hashlib.sha256(x.encode('utf-8')).hexdigest(), 16)
return (ID % 10**10)
To my understanding, this should work in most cases, but depending on the actual content of the dictionary and the keys, it's not impossible that two different dics yield the same ID. For example, if I do not hash the keys and two different entries can be "1.0", then I can have a problem.
Do you have anything to suggest, which hopefully does not rely on luck?
Edit: I add a bigger code on what I'm trying to do: it's basically a random parameter optimisation. Code on pastebin