8

I am using the djb2 algorithm to generate the hash key for a string which is as follows

hash(unsigned char *str)
{
    unsigned long hash = 5381;
    int c;

    while (c = *str++)
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

    return hash;
}

Now with every loop there is a multiplication with two big numbers, After some time with the 4th of 5th character of the string there is a overflow as the hash value becomes huge

What is the correct way to refactor so that the hash value does not overflow and the hashing also happens correctly

Lukáš Lalinský
  • 40,587
  • 6
  • 104
  • 126
Jainish
  • 89
  • 1
  • 1
  • 2
  • 1
    There is not such thing as DJB2 hash, there is the only the standard DJB, and then Salsa20 et al. –  Dec 16 '10 at 00:52
  • 4
    http://www.cse.yorku.ca/~oz/hash.html refers to DJB2, I believe the terminology is widely used, if not formally recognized. – yoyo Dec 11 '12 at 21:19

4 Answers4

21

Hash calculations often overflow. That's generally not a problem at all, so long as you have guarantees about what's going to happen when it does overflow. Don't forget that the point of a hash isn't to have a number which means something in terms of magniture etc - it's just a way of detecting equality. Why would overflow interfere with that?

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • To put it more precise: the multiplication with 33 (the shift and addition) is *expected* to overflow. And so is the addition of c. That's part of the specification, so to speak. – Sven Aug 10 '19 at 15:50
5

You shouldn't do that. Since there is no modulo, integer overflow is the expected behavior for the function (and it was designed with it in mind). Why do you want to change it?

Lukáš Lalinský
  • 40,587
  • 6
  • 104
  • 126
4

I'm thinking your using a static/runtime analyser to warn about integer overflows? Well this is one of those cases where you can ignore the warning. Hash functions are designed for specific types of properties, so don't worry about the warnings from your analyser. Just don't try to create a hash function yourself!

Davy Landman
  • 15,109
  • 6
  • 49
  • 73
1

return (hash & 0xFFFFFFFF); // or whatever mask you want, doesn't matter as long as you keep it consistent.

Duy Truong
  • 19
  • 1