0

Can anyone explain me the logic behind this hash-function?.

static int hash(int h) {
   h ^= (h >>> 20) ^ (h >>> 12);
   return h ^ (h >>> 7) ^ (h >>> 4);
}

I pass the key.hashCode() to this function which gives me the hash value. Based on this value and the Array size, i calculate the index of the array. I am just not understanding the operator used here in this method.

  1. What does this ^ operator do here in this context. Does it check for !=
  2. What does Unsigned right shift >>> do?. We don't have Unsigned int's in Java right?.
  3. How are the values 20, 12, 7 and 4 chosen for this function?. Is it pre defined or user defined?.

The key.hashCode() passed to this hash function is 79847235. Can anyone explain on what happens inside to return the final hash value.

Shane
  • 5,517
  • 15
  • 49
  • 79

2 Answers2

3

Have a look at the below:

Bitwise and Bit Shift Operators

~       Unary bitwise complement
<<      Signed left shift
>>      Signed right shift
>>>     Unsigned right shift
&       Bitwise AND
^       Bitwise exclusive OR
|       Bitwise inclusive OR

Related to the unsigned right shift see: unsigned right Shift '>>>' Operator in Java

Also, for >>> I think that:

The >>> operator shifts the bits of expression1 right by the number of bits specified in expression2. Zeroes are filled in from the left. Digits shifted off the right are discarded. So this shifting doesn't respect the sign.

So what does this function do...

  • (h >>> 20) divides h by 2 to the power of 20. (it shifts 20 times rightwards). Also,it means that if your number was negative, it will not continue to be so.
  • (h >>> 12) divides h by 2 to the power of 12. (it shifts 12 times rightwards)...again same with negatives.
  • The two results are then XORed,and then are XORed again with original H.
  • Following,more XORing goes on to calculate the final hash.

Edit: Noticed this has been extensively analyzed in the accepted answer to: Understanding strange Java hash function

Community
  • 1
  • 1
Menelaos
  • 23,508
  • 18
  • 90
  • 155
1

Unsigned right shift (>>>), unlike signed right shift (>>), will convert negative numbers into positive numbers before preforming the right shift operation ensuring that the result returns an unsigned positive integer. A right shift, for example h >>> 20, essentially means return the floored integer for h/Math.pow(2,20).
For example for your input 79847235, as it is a positive number, both unsigned right shift and signed right shift will return a positive integer. 79847235>>>20 will thus preform:
Math.floor(79847235/Math.pow(2,20)) which returns 76.
Next up h >>> 12 for 79847235 is:
Math.floor(79847235/Math.pow(2,12)) which returns 19493
(which is bigger because we are dividing by a smaller number).
Now we preform an exclusive OR on 76 and 19493.
For example 1^0 is 1
1^1 is 0
If we wanted the AND included, we'd have to use an inclusive OR which is |
Thus 1|1 is 1
1|0 is 0
etc

1001100 is the binary representation of 76
100110000100101 is the binary representation of 19493
An exclusive OR operation would look like this:

000000001001100 (76 in binary)
100110000100101 (19493 in binary)
---------------
100110001101001 (our result: 19561)

We're almost done for our first line:
We got this:
h ^= (h >>> 20) ^ (h >>> 12);
down to:
h ^= 19561

Which is the same as:
h = h^19561

Fill in our value for h which is 79847235:
79847235^19561 is 79827754
h = 79827754
It is important to remember that our new value for h is now 79827754


Next line:
return h ^ (h >>> 7) ^ (h >>> 4);

h>>>7 is Math.floor(79827754/Math.pow(2,7)) which returns 623654
h>>>4 is Math.floor(79827754/Math.pow(2,4)) which returns 4989234

Now that the parentheses are out of the way we have:
return h ^ 623654 ^ 4989234;
It is important to preform this left to right.
Fill in h and regroup:
return (79827754 ^ 623654) ^ 4989234;



79827754 ^ 623654 is:
100110000100001001100101010 (79827754 in binary)
000000010011000010000100110 (623654 in binary)
---------------------------
100110010111001011100001100 (80451340 in binary)

Finally we have:
return 80451340 ^ 4989234;

80451340 ^ 4989234 is:
100110010111001011100001100 (80451340 in binary)
000010011000010000100110010 (4989234 in binary)
---------------------------
100100001111011011000111110 (76002878 in binary)

Thus our final result is:
return 76002878;

Feel free to check my answer, I double checked my work already.
Due to the nature of exclusive bitwise ORs it can be very difficult to predict whether the result for the hash function will be greater or smaller than our parameter. For example:
11^2 is 9 (smaller than our parameter 11)
17^2 is 19 (greater than our parameter 17)

Ultimater
  • 4,647
  • 2
  • 29
  • 43
  • There's already the same question and an answer I linked to: http://stackoverflow.com/questions/9335169/understanding-strange-java-hash-function . Since this is also a duplicate I wonder why you answered.... – Menelaos Sep 26 '13 at 08:32
  • Ultimater, Thanks for your answer. I really would like to know, Why it is specially 20,12,7 and 4. How they measure the randomness. Most of the answers here say introduce randomness etc. How it is creating that randomness? Why the right shift positions has to be 20, why it cant be 21 or 19?. Can you please explain that too? Sorry if I am missing some basic stuff. – Rengasami Ramanujam May 05 '14 at 08:27
  • You could use other numbers too. If you ran every number 0 to 100 quadrillion through the function, you'd see there would be no repetitions in the output no matter what numbers you use in the function. This is because of the nature of exclusive OR. 2^5=7. 7^5=2. 2^7=5. If you ran `i^2` in a loop `for(var i=0;i<100;i++)` and stored the results in an array. You'd see the array would contain every number 0 through 99 with no repetitions. http://jsfiddle.net/e9v2D/. Thus no matter what numbers you used it wouldn't contain repetitions due to XOR's flip-flop behavior as seen with the 2,5, and 7. – Ultimater May 26 '14 at 22:26
  • Why specifically those numbers, the idea of the function is to flop the bits 12 - 20 as well as the bits 7 to 4 on the function parameter then return the result. The idea of the function is to avoid using `object.hashCode() % 20` (assuming you had 20 buckets) to decide which bucket to store to since there tends to be patterns with hashCode() which would make the bucket distribution less likely to be evenly distributed. That is where XOR's flip-flop behavior comes into play to help add a bit more randomness to the hashCode() before you do `% 20` to decide which bucket to send it to. – Ultimater May 26 '14 at 22:41