-1

I want to write a good integer hash function for a hash table. Even though I suspect my hash table will not be too large (say of size 36 elements), the "key" which generates the hash value can vary drastically the values ranging from 0,20, 31, .... 11456,13444 etc. Similar questions have been posted here before, and my hash function is inspired from on the proposed answers provided here.

Following is the structure of my table:

  typedef struct _list_t_ {
  int key;
  int value;
  struct _list_t_ *next;
  } list_t;

  typedef struct _hash_table_t_ {
  int size;       /* the size of the table */
  list_t **table; /* the table elements */
  } hash_table_t;

Following is my current hash function:

  unsigned int hash(hash_table_t *hashtable, int key)
  {
   unsigned int hashval;

   hashval = 0;
   hashval = key;
   hashval = ((hashval >> 16) ^ hashval) * 0x45d9f3b;
   hashval = ((hashval >> 16) ^ hashval) * 0x45d9f3b;
   hashval = ((hashval >> 16) ^ hashval);
   return hashval % hashtable->size; // MOD done to keep within the range of the table size
   }

As mentioned above the "key" which generates the hash value varies drastically (values ranging from 0,20, 31, .... 11456,13444 etc.). The problem is that I notice this hash function generating the same hash value very frequently. Is there a way I can tweak it so that the chances of ending with a new hash value are more.

Community
  • 1
  • 1
PGOnTheGo
  • 805
  • 1
  • 11
  • 25
  • 4
    It is hard to write a good hash function. Use an existing one that has been well tested. –  Jun 21 '13 at 16:10
  • There are a couple general purpose hash functions, and their implementations, [here](http://www.partow.net/programming/hashfunctions/) – Kninnug Jun 21 '13 at 16:13
  • Well it might be bad, but have you tested it objectively? If your stare at strong random output containing only 36 unique symbols you're bound to see repeating patterns in it. It's just how human brains work. It doesn't mean the hash is broken; it's just overly constrained by the output range. Also, of course, if the input isn't unique then the output _cannot_ be unique. – sh1 Jun 21 '13 at 23:13

1 Answers1

1
 unsigned int hash(hash_table_t *hashtable, int key)

This is a fairly rare opportunity to create a perfect hash function. A function that generates a unique value for every distinct input value. You cannot possibly do better. It is possible in this case because the number of input bits is equal to the number of output bits. Typical hash functions need to deal with many more input bits and a limited number of output bits. Which creates the inevitable problem of hash collisions. No such problem with a perfect hash.

The perfect hash function in this case, as always, is trivial:

 unsigned int getslot(hash_table_t *hashtable, int key)
 {
   return (unsigned)key % hashtable->size;
 }

Do note that you need to distinguish between the hash function and the code that maps the hash to a slot or bucket. I combined them in one function since they are so trivial and gave it a proper name. Also note that adding any entropy like you did is pointless, the result cannot be better distributed than the original. It only makes sense to do if you have more input values and they can be correlated.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • What if many of the keys are multiples of a factor of the size of the hashtable (for example multiples of 8 in hash table of size 128)? With this approach you *will* end up having many collisions, so scrambling the input bits is still essential. – Joni Jul 20 '13 at 08:47
  • One essential rule for any hash table implementation is that the number of slots is a prime. No literature about hashing ever fails to point this out. – Hans Passant Jul 20 '13 at 08:55
  • Many implementations in practice use a power of two or other composites to make resizing easier, for example HashMap from standard Java library. A decent hash code implementation make this a non-issue. – Joni Jul 20 '13 at 09:02
  • The usage of GetPrime() in [this implementation](http://www.dotnetframework.org/default.aspx/Net/Net/3@5@50727@3053/DEVDIV/depot/DevDiv/releases/whidbey/netfxsp/ndp/clr/src/BCL/System/Collections/Hashtable@cs/1/Hashtable@cs) is a good example. – Hans Passant Jul 20 '13 at 09:26