0
int hash (const string &key, int tableSize) {
   int hashVal = 0; 

   for (int i = 0; i < key.length(); i++)
        hashVal = 37*hashVal + key[i]; 
   hashVal %= tableSize; 
   if (hashVal < 0)   /* in case overflows occurs */
        hashVal += tableSize; 

   return hashVal;      
};

Why do we control if hashVal is smaller than zero? How is this possible?

Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
user1559792
  • 347
  • 2
  • 6
  • 15
  • It is possible because signed integer overflow results in undefined behaviour. – Mankarse Dec 30 '12 at 12:34
  • Why is it become a negative number when exceed the limit of integer? – user1559792 Dec 30 '12 at 12:38
  • Read this http://en.wikipedia.org/wiki/Two's_complement – asheeshr Dec 30 '12 at 12:43
  • 0111 -> 1000 : If the data type size were 4 bits, then 0111 would represent the maximum positive value followed by 1000 which represents the minimum value(negative in case of signed types) of the range as per 2's complement notation(which is the most commonly used way to represent numbers in memory). – asheeshr Dec 30 '12 at 12:45
  • 1
    @user1559592: Becoming negative something is within the allowed bounds of "undefined behaviour". It is likely that the compiled code uses a [two's-compliment](http://en.wikipedia.org/wiki/Two's_complement) representation for numbers, and so numbers over the maximum positive number become negative. This cannot be relied on (even when you know the target architecture, you do not know about every possible assumption in the code generator), nor is it particularly relevant. The behaviour is undefined, so this code should be rewritten (unless there is a *very* good reason for leaving it in). – Mankarse Dec 30 '12 at 12:47

5 Answers5

2

You can get overflow in the variable hashVal. This (sometimes) results in a negative value. For example, try to print the value of 3 * 1000 * 1000 * 1000 in a C++ program:

std::cout << 3 * 1000 * 1000 * 1000;

On my computer, and with my compiler, this prints -1294967296.

What happens is that the result, 3000000000, is 10110010110100000101111000000000 in binary, but since integers are 32 bits on this particular platform, and we use the two's-complement method to represent negative numbers, this bit pattern represents a negative number.

The standard defines integer overflow as undefined behaviour, so actually anything could happen, but this is the typical effect.

Thomas Padron-McCarthy
  • 27,232
  • 8
  • 51
  • 75
2

If the string is long enough, the code:

for (int i = 0; i < key.length(); i++)
    hashVal = 37*hashVal + key[i]; 

might cause the value of hashVal to exceed the maximum value of an int (typically something like 231 − 1) and become negative. This is known as integer overflow.

The C++ standard does not specify whether the value of the % operator for negative operands should be positive or negative; thus, depending on your compiler and CPU architecture (and possibly compile-time switches), an expression like -47 % 37 may evaluate to either -10 or 27. Thus, the code you've quoted guards against the former possibility by adding the modulus to the result if it's negative.

By the way, an easier way to avoid this issue would have been to define hashVal as unsigned.

Community
  • 1
  • 1
Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
0

If key is long enough, hashVal value may become negative. You can experiment with strings of different length (for example "1", "11", "111", "1111" etc) to see where hashVal will become negative (about 5-7 characters should be enough).

Then you try to get modulo of negative number, which will be also negative. But you can't point to negative array index (it seems, this function calculates position for string to be stored in), so you make it positive and appropriate to be an array index.

AlexErofeev
  • 161
  • 6
0

hashVal gets bigger and bigger very fast in the for loop, it can easily gets bigger than the biggest signed int value, which is platform dependent. If hashVal were negative after the for loop, it may still be negative after %= operator, which is also platform dependent(in some case, it always return nonegative values, while it may return negative as well) then, you need to check whether hashVal is negative afterwards.

flyingfoxlee
  • 1,764
  • 1
  • 19
  • 29
0

Try calling your hash function the following way

hash("HelloHello",100);

And then step through the program or print a message in the hash function to see if hash ever goes below 0.

For eg, in the for loop you can put

if(hashVal < 0)
{
    cout<<"OVERFLOW HAS HAPPENED\n";
    break;
}

And you will see hashVal going below 0.

user93353
  • 13,733
  • 8
  • 60
  • 122