2

I am trying to implement a randomized algorithm called tidemark. It is described in Unit 2 of these lecture notes. To do this I need a randomly chosen hash function that maps the integers [1,...,n] to [1,...,n]. Python has a few different hash function in libraries but I didn't find one that would enable me to specify the domain and range and would choose a suitable function at random.

Does such a thing exist?

Simd
  • 19,447
  • 42
  • 136
  • 271
  • 1
    The notes you linked use a "hash function" in specific sense, namely a function from a [2-universal](https://en.wikipedia.org/wiki/Universal_hashing) hash family. If you don't know what that means, it may be best to follow what the paper says before proceeding: "If you are unfamiliar with the concept [of a 2-universal hash family], working through Exercises 2-1 and 2-2 is strongly recommended." Those exercise describe exactly what sort of function you need here. A randomly chosen function may not be adequate. – Brian61354270 Apr 20 '21 at 14:22
  • @Brian I do know that that means. I just wanted an easy way to implement it in python without reinventing the wheel. – Simd Apr 20 '21 at 14:23

3 Answers3

3

Well, off the top of my head, I'd piggyback off Python's hash() but twist the numbers it returns using a random number. (Note that hash() doesn't have a deterministic value between Python interpreter restarts either.)

import secrets

def gen_random_hasher(max_val=1024):
    seed = secrets.randbits(64)
    return lambda val: (hash(val) ^ seed) % max_val

s1 = gen_random_hasher()
s2 = gen_random_hasher()

print(s1('aaa'), s1(123))
assert s1('aaa') == s1('aaa')  # "prove" the function is deterministic
print(s2('aaa'), s2(123))

This prints out e.g.

447 885
55 765

so you can tell s1 and s2 are different.

AKX
  • 152,115
  • 15
  • 115
  • 172
  • `(hash(val)^seed)%max_val` does not generate independent hash functions if `max_val` is a power of two, which could be a common occurrence. If `val` is an integer you could do `hash(val^seed)%max_val` instead provided that `hash()` is a good hash function, which it is not in my installation, where it seems to be the identity function. – Wolfgang Brehm Apr 29 '21 at 16:48
  • 1
    @WolfgangBrehm `hash()` is an identity function for (small enough) integers, but not for anything else: https://github.com/python/cpython/blob/76cd81d60310d65d01f9d7b48a8985d8ab89c8b4/Objects/longobject.c#L2915 – AKX Apr 29 '21 at 17:28
  • 1
    That could still be a problem for OP, as he wants to hash integers. – Wolfgang Brehm Apr 29 '21 at 17:43
1

What you are looking for is a universal hash function. Many hashing algorithms use a random number internally that can be used to generalize them to a universal hash function.

You can also use any integer hash function and then multiply with a random integer larger than 0 before the modulus reduction.

Depending on how well the values before hashing are distributed and how well the hash values need to be distributed this is probably entirely sufficient:

from functools import partial
from random import randint

def hash_family(value,n,salt):             # output is [0,n) (i.e. excluding n)
  value=value*(2*salt+1)                   # multiplying with 0 is bad
  value=value^(value>>(n.bit_length()//2)) # xorshift to improve distribution
  value=value&((1<<n.bit_length())-1)      # cut off high bits for speed
  value=value*(2*salt+1)                   # another multiplication
  value=value^(value>>(n.bit_length()//2)) # another xorshift
  value=value&((1<<n.bit_length())-1)      # cut off high bits
  value=value%n                            # modulus reduction
  return value

random_hash = partial(hash_family,salt=randint(0,2**64))
>>> hash_family(23,100,56)
73
>>> hash_family(23,100,57)
52
>>> hash_family(23,100,58)
30
>>> random_hash(17,100)
42

For cryptographic strength use pythons hashlib, many of the hash functions there have a salt parameter that can be set with randint and bound using partial to use it as a universal hash function. Range reduction can be done with the modulo operation.

Wolfgang Brehm
  • 1,491
  • 18
  • 21
0

Use standard md5 hash and use os.urandom as the random seed, and the hash function can be reused through the seed

def get_random_hashfunc(_max=1024, seed=None):
    seed = seed or os.urandom(10)
    seed_hash_func = hashlib.md5(seed)
    def hashfunc(n):
        func = seed_hash_func.copy()
        func.update(n.to_bytes(n.bit_length(), 'big'))
        return int.from_bytes(func.digest(), 'big') % _max
    return hashfunc, seed

hash_func1, seed1 = get_random_hashfunc()
hash_func2, seed2 = get_random_hashfunc()
hash_func3, seed3 = get_random_hashfunc(seed=seed1)

>>> hash_func1(123)
156
>>> hash_func2(123)
931
>>> hash_func3(123)
156
scp10011
  • 136
  • 3