1

I am trying to generate a hash code from two integer inputs. The approach outlined in

Combining Java hashcodes into a "master" hashcode

seems to work well for many input values. However, when one of the input integers is int.MinValue, the behavior seems less than ideal. Specifically I observe

int.MinValue * 1013 == int.MinValue

int.MinValue * 1009 == int.MinValue

but

int.MinValue * 2 == 0

int.MinValue * 20 == 0

All of this is in an unchecked context.

I would naively (and wrongly) assume that int.MinValue * (something other than 1 or 0) would yield a new bit pattern different than int.MinValue or 0.

Questions

  1. Why does multiplying int.MinValue by these constants yield int.MinValue (2 cases) or 0 (2 cases)?
  2. Does the behavior of int.MinValue indicate a flaw in the hash algorithm?
Community
  • 1
  • 1
Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • It *might* indicate a flaw in *that* hash algorithm, but not all hash algorithms are implemented with only multiplication. In fact, take a look at how .NET usually does it, by adding in one prime number and multiplying with another. The addition might indicate a fix for this "flaw". – Lasse V. Karlsen Jul 01 '14 at 06:42
  • @LasseV.Karlsen: It seems this method of creating a hash is considered flawed relative to other alternatives. A Rotating Hash seems like a more appropriate choice http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx – Eric J. Jul 01 '14 at 19:43

1 Answers1

2

Multiplication is more or lest shift of bits to left. Since int.MinValue is 0x80000000 (only one highest bit set) multiplication can produce only two int values - 0 (if multiplying by even number ) or value with highest bit still set (for odd numbers).

Sample for 4 bit numbers (x,y,z - any value for particular bit, 1000 is equivalent of int.MinValue )

1000 * xyz1 =
   (xyz0 * 1000)  + 1000 * 1 = 
   (xyz  * 10000) + 1000 * 1 =  
   (xyz  * 0)     + 1000 = 1000

1000 * xyz0 = 
   (xyz  * 10000) + 1000 * 0 =  0
Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
  • So this means that for prime P if P * inputValue < int.MinValue (when looking at arbitrary precision results), the final result would be int.MinValue (when looking at Int32 results), since P must be odd for P > 2? This seems to indicate a large number of hash collisions for values near int.MinValue. Would it be better to perform the math in a `long` then cast back to `int` before returning a hash value? – Eric J. Jul 01 '14 at 01:49
  • I'm not good with probabilities nor hashing expert, so I can't say how serious the problem is. To me it feels not very common - you either hash once (and should get random distribution) or combine multiple hash values - getting all hashes to be close to int.MinValue don't feel very likely. I've added "hashcode" tag so there are more useful related questions - check out http://stackoverflow.com/questions/34595/what-is-a-good-hash-function?rq=1 as it seem to have reasonable discussion/links on hashing in general. – Alexei Levenkov Jul 01 '14 at 06:27
  • I think for my purposes, the edge cases will not matter. For general purposes, I learned that this method of combining hashes is not very good and would use something more like a Rotating Hash http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx – Eric J. Jul 01 '14 at 19:42