I am using uthash.h
for hash implementation in C. I am using the hash-table for a basic word count exercise. I have a file containing words and I have to count frequency of each word. The implementation of uthash.h
requires me to generate an integer id for each entry, and I wanted to calculate a unique integer corresponding to each string. I tried using md5 hash algorithm, but it generates strings with digits and alphabets, so its no use.Can anybody suggest me such an algorithm.
Asked
Active
Viewed 2,103 times
0

Love Bisaria
- 167
- 13
-
A good implementation of the md5 hash should be able to give you the raw 16-byte array. Split this into 4 32bit integers and xor them together. That alphanumeric string is just a convent representation for displaying the hash. – Gareth A. Lloyd Feb 20 '15 at 21:43
-
See http://stackoverflow.com/questions/16521148/string-to-unique-integer-hashing and http://stackoverflow.com/questions/1010875/string-to-integer-hashing-function-with-precision. – Mihai8 Feb 20 '15 at 21:44
-
@user1929959, the second link that you mentioned has hashing functions that return unsigned long values, but in `uthash.h` implementation the id needs to be integer. I am not wether this will work or not. I will try this approach and post my results once done. In the mean time if you have any more suggestion, please post them. – Love Bisaria Feb 20 '15 at 21:56
-
I head [murmur3](http://en.wikipedia.org/wiki/MurmurHash) is pretty good for strings. – Niklas B. Feb 20 '15 at 22:18
-
And please don't use md5 or any other cryptographic hash function for this. Their computation is *much* slower than good non-cryptographic hash functions. – Niklas B. Feb 20 '15 at 22:19
1 Answers
0
Use Robert Sedgewick's algorithm for hashing
unsigned int GenerateHash(char* str, unsigned int len)
{
unsigned int result = 0;
unsigned int b = 378551;
unsigned int a = 63689;
unsigned int i = 0;
for(i=0; i<len; str++, i++)
{
result = result*a + (*str);
a = a*b;
}
return result;
}

mprivat
- 21,582
- 4
- 54
- 64