So, I saw Hashing a dictionary?, and I was trying to figure out a way to handle python native objects better and produce stable results.
After looking at all the answers + comments this is what I came to and everything seems to work properly, but am I maybe missing something that would make my hashing inconsistent (besides hash algorithm collisions)?
md5(repr(nested_dict).encode()).hexdigest()
tl;dr: it creates a string with the repr
and then hashes the string.
Generated my testing nested dict with this:
for i in range(100):
for j in range(100):
if not nested_dict.get(i,None):
nested_dict[i] = {}
nested_dict[i][j] = ''
I'd imagine the repr should be able to support any python object, since most have to have the __repr__
support in general, but I'm still pretty new to python programming. One thing that I've heard of when using from reprlib import repr
instead of the stdlib one that it'll truncate large sequences. So, that's one potential downfall, but it seems like the native list
and set
types don't do that.
other notes:
- I'm not able to use https://stackoverflow.com/a/5884123, because I'm going to have nested dictionaries.
- I used python 3.9.7 when testing this out.
- Not able to use https://stackoverflow.com/a/22003440, because at the time of hashing it still has IPv4 address objects as keys. (json.dumps didn't like that too much )