2

I need to create two independent hash functions to implement a bloom filter in java.

These two hash functions h_1(x) and h_2(x) will be used to simulate additional hash functions when needed.

I understand how to create a basic hash function like this:

hash function h(x) = x mod M, where M represent the hash table size and is a prime number.

Given a string x containing characters c_i : x <--> c_1, c_2, ... , c_n, x = c_1.c_2.c_3...c_n (. for concatenation)

Each character will be converted to ASCII code E{0, .. ,127} then multiply by a different constant for each characters here 128^n-1 to 128^0.

This way string containing the same characters in different order won't hash to same value.

x = c_1*128^n-1 + c_2*128^n-2 + ... + c_n*128^0

How can I create a second hash function that is independent of this one?

Would changing the constants be sufficient?

How can I verify that they are indeed independent?

  • I think "independent" here means that one doesn't call the other. – Sweeper Jul 22 '18 at 01:17
  • Can you please explain what do you mean by 'independent'? To propose one straightforward answer, use any two hash functions out of [multiple available](https://en.wikipedia.org/wiki/List_of_hash_functions). – displayName Jul 22 '18 at 02:12
  • @displayName I asked this same question on Reddit and someone informed me that functions are independent by definition. If this is right i'm thinking it only means two different hash functions, in this case your proposition seems correct. – Phenotype Alpha Jul 22 '18 at 03:51
  • See https://stackoverflow.com/a/24685697/56778. When I was working with Bloom filters, that technique worked quite well. – Jim Mischel Jul 23 '18 at 03:41
  • Please create a [MVCE](https://stackoverflow.com/help/mcve) of the problem so others can best assist you. – Kristopher Ives Jul 24 '18 at 09:23

1 Answers1

1

A fast bloom filter implementation doesn't typically use two fully independent hash functions. Instead, it uses one good hash function that returns enough bits, and creates multiple hash values from that. For example: use Murmur3 hash to create a 128 bit hash value. From that, use the lower 64 bits and the higher 64 bits as follows:

h(x) = (higher + x * lower) mod M

This is how the Google Guava Bloom filter implementation currently works.

M doesn't necessarily need to be a prime number in practise, even thought it doesn't hurt. (In theory, it should be one.)

As for the reduction step: mod M could be replaced with a multiply and shift.

Your hash function uses 128^n. It's probably much better to use a well known hash functions such as Murmur 3.

Thomas Mueller
  • 48,905
  • 14
  • 116
  • 132