In python2.7, I'm successfully using hash()
to place objects into buckets stored persistently on disk. A mockup code looks like this:
class PersistentDict(object):
def __setitem__(self, key, value):
bucket_index = (hash(key)&0xffffffff) % self.bucket_count
self._store_to_bucket(bucket_index, key, value)
def __getitem__(self, key):
bucket_index = (hash(key)&0xffffffff) % self.bucket_count
return self._fetch_from_bucket(bucket_index)[key]
In python3, hash()
uses a random or fixed salt, which makes it unusable/suboptimal for this [1]. Apparently, it's not possible to use a fixed salt for specific invocations. So, I need an alternative:
- Must be stable across interpreter invocations
- May require parameters supplied at execution time, e.g. setting a salt in the call
- Must support arbitrary objects (anything supported by
dict
/set
)
I've already tried using hash functions from hashlib (slow!) and checksums from zlib (apparently not ideal for hashing, but meh) which work fine with strings/bytes. However, they work only on bytes-like objects, whereas hash()
works with almost everything.
[1] Using hash()
to identify buckets is either:
- Not reliable across interpreter invocations, if salts are random
- Prevents applications from using the random salting feature, if salts are fixed
- Unusable if two
PersistentDict
s were created with different salts