0

I am using uthash.h for hash implementation in C. I am using the hash-table for a basic word count exercise. I have a file containing words and I have to count frequency of each word. The implementation of uthash.h requires me to generate an integer id for each entry, and I wanted to calculate a unique integer corresponding to each string. I tried using md5 hash algorithm, but it generates strings with digits and alphabets, so its no use.Can anybody suggest me such an algorithm.

Love Bisaria
  • 167
  • 13
  • A good implementation of the md5 hash should be able to give you the raw 16-byte array. Split this into 4 32bit integers and xor them together. That alphanumeric string is just a convent representation for displaying the hash. – Gareth A. Lloyd Feb 20 '15 at 21:43
  • See http://stackoverflow.com/questions/16521148/string-to-unique-integer-hashing and http://stackoverflow.com/questions/1010875/string-to-integer-hashing-function-with-precision. – Mihai8 Feb 20 '15 at 21:44
  • @user1929959, the second link that you mentioned has hashing functions that return unsigned long values, but in `uthash.h` implementation the id needs to be integer. I am not wether this will work or not. I will try this approach and post my results once done. In the mean time if you have any more suggestion, please post them. – Love Bisaria Feb 20 '15 at 21:56
  • I head [murmur3](http://en.wikipedia.org/wiki/MurmurHash) is pretty good for strings. – Niklas B. Feb 20 '15 at 22:18
  • And please don't use md5 or any other cryptographic hash function for this. Their computation is *much* slower than good non-cryptographic hash functions. – Niklas B. Feb 20 '15 at 22:19

1 Answers1

0

Use Robert Sedgewick's algorithm for hashing

unsigned int GenerateHash(char* str, unsigned int len)
{
   unsigned int result = 0;
   unsigned int b    = 378551;
   unsigned int a    = 63689;
   unsigned int i    = 0;

   for(i=0; i<len; str++, i++)
   {
      result = result*a + (*str);
      a = a*b;
   }

   return result;
}
mprivat
  • 21,582
  • 4
  • 54
  • 64