2

As we know the recommended way to create a hash code out of a long value in Java is

(int) (this.longValue()^(this.longValue()>>>32))

However the other possible way to create a hash code out of a long value is simply

(int) this.longValue() 

If we're considering all the longs from Long.MIN_VALUE to Long.MAX_VALUE together, the function

(int) this.longValue()

will basically generate following values:

  1. the range will be [Integer.MIN_VALUE; Integer.MAX_VALUE]

  2. I'm not sure 100% but values would probably be evenly distributed

So if we're not talking about special cases like storing two ints in one long and treat upper and lower bytes separately - why are we end up using bitwise operations instead of simply taking int part?

Is there are any guidelines? Or maybe there are some underlying theory?

The similar question already has been asked in Bit-shifting in Effective Java hashCode() implementation, but quote from there:

You could just take the bottom 32 bits - but then that means changes in only the top 32 bits would be ignored, which wouldn't make it a very good hash.

So I'm particularly interested why taking the bottom 32 bits is 'wouldn't make it a very good hash'.

Community
  • 1
  • 1
Igor Kustov
  • 787
  • 1
  • 8
  • 21
  • It's not the 'recommended' way. It is the way that is specified for `java.lang.Long`. You're free to do it any other way that works for you and satisfies the contract. – user207421 Oct 27 '15 at 09:25
  • Ok, but I'm interested in what are the assumptions that they've used to generate bitwise formula. It doesn't look like it has been taken from the ceiling – Igor Kustov Oct 27 '15 at 09:32
  • It's all in your citations. Ignoring the top 32 bits can't be as good as taking them into account. Surely this is obvious? – user207421 Oct 27 '15 at 09:37
  • It's pretty obvious that it's better not to ignore something than to ignore, I absolutely agree with you. However if we take all longs and this hash functions is designed for all longs we going to get all Integer values and even distribution anyway. – Igor Kustov Oct 27 '15 at 09:43
  • Of course, but you're going to get a better distribution. As a *reductio ad absurdam,* consider just using 1 bit to generate the hashcode. You'll get a lovely even distribution between zeros and ones, but not much discrimination. – user207421 Oct 27 '15 at 09:49
  • I'm continue being slow here, sorry. I'll try to rephrase: for me even distribution is that if you go from Long.MIN_VALUE to Long.MAX_VALUE consequentially and hash each value, get an int and then put this int into HashMap. Let's say it is really big HashMap which has [Integer.MAX_INT - Integer.MIN_VALUE] buckets. So 'even distribution' is when you have the approximately the same amount of values in each bucket finally. So If this distribution is skewed somehow than the hash function is bad. – Igor Kustov Oct 27 '15 at 10:03
  • So is default hash implementation is less skewed than just getting (int) values? And how can we measure this skew without going through all the inputs? – Igor Kustov Oct 27 '15 at 10:03
  • No single hashing function fits all needs. Sometimes casting to int is good sometimes `high ^ low` is good. It depends on the distribution of data. Designing a hashing function for a specific problem is just as important as desining an efficient algorithm for that problem. – piotrekg2 Oct 27 '15 at 10:56
  • @EJP: what about the following uniform distribution (assuming 4 bit hashes for simplification, high and low parts are separated by '|'): 00|00, 01|01, 10|10, 11|11, other numbers have probability eq. to 0. I can build a set of 2^(N/2) of such numbers where N is the length of a hash in bits. – piotrekg2 Oct 27 '15 at 11:06

0 Answers0