1

I'm working on a hash function which gets a string as input.

Right now I'm doing a loop and inside the hash (an int variable) is being multiplied by a value and then the ASCII code for the current character is added to the mix.

hash = hash * seed + string[i]

But sometimes, if the string is big enough there is an integer overflow there, what can I do to avoid it while maintaining the same hash structure? Maybe a bit operation included inside the loop?

Jonathan
  • 11
  • 1
  • 3
    Why do you need to avoid overflow? The only critical feature of a hash function is that for any given data, the hash function gives a consistent result. Sure, collision avoidance is nice, but not critical. – torak May 04 '10 at 19:07
  • If hash*seed causes an integer overflow, and string[i] is positive, there's no way it won't cause an overflow, no matter which way you try to do it. Do you mean you want to limit the hash to a max value, via the modulo operator? – bobDevil May 04 '10 at 19:09
  • @torak: Signed integer overflow causes undefined behaviour in C, which means that correct programs must take care to avoid it. – caf May 05 '10 at 00:09
  • @caf: Integer overflow is only undefined for signed ints. Unsigned is well defined and acceptable. – corsiKa May 05 '10 at 14:22
  • @glowcoder: Hence the word "Signed" in my comment ;) – caf May 05 '10 at 22:51

4 Answers4

1

Hash functions like this one are supposed to overflow. You have to declare "hash" unsigned. If you really need an int than simply use hash & 0x7fffffff. Review the Fowler-Noll-Vo algorithm, you'll find links to source code there.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
1

There are a number of possible interpretations of your question, and as noted by the comments, you may need to clarify.

The only sensible interpretation however is that you want to restrict the hash value to a specified range. Assuming that, then if the range were 0 to HASH_TABLE_SIZE - 1, then:

hash = (hash * seed + string[i]) % HASH_TABLE_SIZE ;

or if the table size is a power of two, use a mask:

#define HASH_TABLE_SIZE (0x01<<8)  // 2^8 (256) table
#define HASH_MODULO_MASK (HASH_TABLE_SIZE - 1)
...
hash = (hash * seed + string[i]) & HASH_MODULO_MASK ;
Clifford
  • 88,407
  • 13
  • 85
  • 165
0

Why not use long to store the result? You can then apply techniques such as this one to detect the overflow

Community
  • 1
  • 1
DVK
  • 126,886
  • 32
  • 213
  • 327
0

If you have access to a larger data type, you can do something like this:

int32_t hash, seed;
int64_t temporary;

temporary = hash * seed + string[i];
hash = ( temporary >> 32 ) ^ ( temporary & 0xFFFFFFFF );

Otherwise you will have to manually multiply the hash and seed into two values, add string[i] with overflow, then ^ the two values.

Hashes are implicitly lossy, so it should be fine to just let the overflow bits go unless there is a specific reason you need them, like matching an existing algorithm.

drawnonward
  • 53,459
  • 16
  • 107
  • 112