7

I am trying to test out how String hashcode works by adding a new character and do the reverse to subtract it out, however it didn't give me the right answer when i did the calculation.

For example

int PRIME = 31;

//this will print 1687846330 
System.out.println("mlkjihgfedcb".hashCode());   

//this will print 783628775 which equals to "mlkjihgfedcba".hashCode();
System.out.println(("mlkjihgfedcb".hashCode() * PRIME + (int) 'a')); 

//this will print 25278344 which doesn't equals to "mlkjihgfedcb".hashCode()
System.out.println(("mlkjihgfedcba".hashCode() - (int) 'a') / PRIME); 

I am wondering did i do the math right on my last step above ? overflow shouldn't matter for the calculation right ?

peter
  • 8,333
  • 17
  • 71
  • 94

3 Answers3

11

Hash functions are not reversible functions. As proof, consider the Pigeonhole principle, you can only store values in the integer range in a hashCode but there are an infinite number of Strings. Therefore, there must be multiple strings with the same hashCode (and there are).

Elliott Frisch
  • 198,278
  • 20
  • 158
  • 249
5

Multiplying a big number by 31 leads to integer overflow, which cannot be reversed using division by 31.

But hold on though: arithmetic using int in Java is equivalent to arithmetic modulo 232. Since 31 is odd it has a multiplicative inverse modulo 232. Therefore multiplication by 31 can be reversed with multiplication by the said "inverse." That is:

int inverse = -1108378657;
// This will print the hashCode of mlkjihgfedcb
System.out.println(("mlkjihgfedcba".hashCode() - (int) 'a') * inverse); 
Joni
  • 108,737
  • 14
  • 143
  • 193
  • how do we get or calculate the inverse number ? – peter Dec 27 '13 at 22:53
  • You can use the extended Euclidean algorithm, or use [Hensel's lemma](http://en.wikipedia.org/wiki/Hensel%27s_lemma#Hensel.27s_Lemma_for_p-adic_Numbers). The latter is easiest to implement: first set `x = a = 31` and then repeat `x = x*(2-a*x)` until it converges, up to five times. – Joni Dec 27 '13 at 23:05
  • But you don't have to calculate it, it's -1108378657 – Joni Nov 26 '18 at 14:00
2

The specific reason this fails is

1687846330 * 31 + 97 = 52323236327

The number 52,323,236,327 is much larger than what an int can hold, so information is lost off the left-hand end. That information cannot be recovered by inverting the hashCode() algorithm.

If you do the arithmetic you'll see that the hash code you got for the second string is exactly 52323236327 mod 231

More important, the mapping from objects to hash codes is intended to be opaque and non-reversible. Just because it's implemented in a specific way today does not mean the next version of the library has to do it the same way. The only guarantee is that the possibility of collision is extremely low (but non-zero).

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
  • I don't know any way the behavior of the string's hash function could change without breaking any `switch` statements that use a string argument, since the compiled code includes hardcoded hash values for the strings. – supercat Dec 30 '13 at 20:35