5

How would you convert a nesting of Python frozenset objects into a unique integer that was the same across Python sessions and platforms?

e.g. I get different values from hash() on different platforms

32-bit

Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a=frozenset([frozenset([1,2,3]),frozenset(['a','b','c'])]);
>>> hash(a)
1555175235

64-bit

Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a=frozenset([frozenset([1,2,3]),frozenset(['a','b','c'])]);
>>> hash(a)
-6076998737938213053
Cerin
  • 60,957
  • 96
  • 316
  • 522

2 Answers2

8

How would you convert a nesting of Python frozenset objects into a unique integer that was the same across Python sessions and platforms?

AFAIK hashes are not guaranteed to be unique. In fact where they are used for lookup tables (like in dictionaries) hashes conflicts are quite common.

That said. If you want a consistent, non-unique "hash" across platform, I would try to use the standard library hashlib. I don't have the possibility to try it on different platform, but I believe that most of the algorithms implemented there (as for example the MD5 one) are platform-independent.

I would feed the hashing algorithms with the pickled version of the sorted set, so as to make sure the string used for hashing is always the same.


EDIT: Thought to add a basic example:

>>> import cPickle as pkl
>>> import hashlib as hl
>>> s = frozenset([1,2,3])
>>> p = pkl.dumps(sorted(s))  #make sure you use the same pickle protocol on all platform!
'(lp1\nI1\naI2\naI3\na.'
>>> h = hl.md5(p)
<md5 HASH object @ 0xb76fb110>
>>> h.digest()
"\x89\xaeG\x1d'\x83\xa5\xbd\xac\xa7\x1c\xd9\x1d/2t"  #this should be consistent
Cerin
  • 60,957
  • 96
  • 316
  • 522
mac
  • 42,153
  • 26
  • 121
  • 131
  • For what is worth, I've checked that the md5 sum of the pickled frozenset is the same in both 32 and 64-bit platforms. – jcollado Dec 26 '11 at 23:29
  • 1
    Is a pickling of an unordered type guaranteed to pickle to the same thing on all platforms? I'm reluctant to trust that it will be. I'd probably pull the elements out of the sets into lists, and sort them prior to pickling. – dstromberg Dec 26 '11 at 23:40
  • Thanks. I'd have to recurse through my nested frozensets, and convert them to sorted lists, but otherwise this should work. – Cerin Dec 26 '11 at 23:41
  • 1
    @dstromberg, I don't believe it is. I think that's why he sorts the frozenset first. – Cerin Dec 26 '11 at 23:42
  • There is no guarantee that the same object `pickle`d in two different sessions / platforms will result in identical files - *even if* the objects are ordered. Therefore, you shouldn't rely on this unless you enjoy solving very hard debugging puzzles in the future. Perhaps recursive `sorted` and `repr` might work, since `repr` on sorted objects is probably uniquely defined, but I have neither tried it, nor thought about it enough to have any confidence it would work. – max Oct 31 '12 at 09:12
0

You can also create your own hash function:

def hash(fs):
    res = 1
    for v in fs:
        res = (res*31 + v) % 2**30
    return res

This won't necessarily be unique, but it is as good as the build in set hash, and you have full control over its result across platforms.

Thomas Ahle
  • 30,774
  • 21
  • 92
  • 114