-3

I'm looking for a way to set python's hash() salt for individual calls to the function. In the docs, I've only found PYTHONHASHSEED which sets the salt for all calls to hash(). However, I need hash to always get me the same result when called by specific objects, but I don't want to force the entire application to use the same (predictable) salt.


Context: In python2, I'm using hash to sort key-value object pairs into indexed buckets. Buckets are stored persistently. This is reversed to fetch the value. Basically, for every pair I do

class PDict(object):
  def __init__(self, bucket_count, bucket_store_path):
    self._path, self.bucket_count = \
      self._fetch_or_store_metadata(bucket_store_path, bucket_count)

  def __setitem__(self, key, value):
    bucket_index = (hash(key)&0xffffffff) % self.bucket_count
    self.buckets[bucket_index][key] = value
    self._store_bucket(bucket_index)

  def __getitem__(self, key):
    bucket_index = (hash(key)&0xffffffff) % self.bucket_count
    return self._fetch_bucket(bucket_index)[key]

This requires hash to always get me the same result per instance, across interpreter invocation.

MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119
  • 1
    And why would you want otherwise? – jonrsharpe Jun 23 '16 at 21:08
  • Since the buckets are stored persistently, I need the same `hash` salt per bucket set. First problem: using a fixed salt means **any** application using the data structure must use the same salt. This defeats the point of the salt. Second problem: using a dynamic salt means an application may end up using two bucket sets, each requiring a **different** salt. This is plain impossible with just PYTHONHASHSEED. – MisterMiyagi Jun 23 '16 at 21:14
  • @jonrsharpe thanks for the related question, guess that answers it as "not possible". Oh well. – MisterMiyagi Jun 24 '16 at 08:32
  • 1
    You should not be using `hash()` at all if you need the value for any other purpose than Python dictionaries and sets. Use a cryptographic hashing function. Yes, this means you'll have to convert your objects to a (canonical) string representation first. – Martijn Pieters Dec 07 '17 at 10:35

1 Answers1

1
import hashlib
def getHash(name):
   m = hashlib.md5()
   m.update(name)
   return m.hexdigest()
galaxyan
  • 5,944
  • 2
  • 19
  • 43
  • This only works for byte objects, strings if being generous. On top, it's x10 slower (still at usec scale, though). – MisterMiyagi Jun 23 '16 at 21:24
  • @MisterMiyagi could you processes obj than pass into function? Using id or other methods – galaxyan Jun 23 '16 at 21:27
  • In principle, yes, but I don't know any stable conversion. That's why I've used `hash` in the first place. `id` will only get the memory location, so it is not predictable across interpreter invocations either. Both `str` and `repr` may produce extensive strings (bad), fall back to `id` (worse). – MisterMiyagi Jun 23 '16 at 21:31