Hashing composite objects

Question

EDIT: This question is not about bitwise operators and can't be answered with Why are XOR often used in java hashCode() but another bitwise operators are used rarely?

I've seen different approaches for hash calculation of object:

class A {
  public B b;
  public C c;

  @Override
  public boolean equals();
  @Override
  public int hashCode() {
   return c.hashCode() ^ b.hashCode(); //XOR
   return c.hashCode() + prime * b.hashCode(); // SUM
   return Objects.hash(b,c); // LIB
  }
}

It seems LIB method uses SUM, but why is it better than XOR?

Even though the example is in Java, this question is more about math and probabilities.

Normally, just use the lib functions. Unless you are going to run a probability distribution analysis to determine how your data points are best distributed. Are you finding a lot of collisions with your data set? — CodeMonkeyForHire, Jun 25 '13 at 12:23
http://stackoverflow.com/questions/2334218/why-are-xor-often-used-in-java-hashcode-but-another-bitwise-operators-are-used — assylias, Jun 25 '13 at 12:25
Josh Bloch discusses a good hash code implementation in *Effective Java*. — Edward Thomson, Jun 25 '13 at 14:21

score 5 · Accepted Answer · answered Jun 25 '13 at 12:24

The SUM ensures that you use all the bits of the hashcode to spread your hashing (in this, the 32 bits of an int), and makes no assumption about sub hashcode() implementation for that.

The XOR only has the same property if the hashcode of B and C has it, else it will only use the minimum of the number of "useful" bits in B and C hashcode, which could lead to a worse distribution, and more frequent collision. It's very easy to see the problem if B and C are integers which tends to be very small, you'll only ever use the first few bits (as int.hashcode() is the identity function).

Adam Siemion · Answer 2 · 2013-06-25T12:38:58.967

0

This is because sum provides gives better distribution than xor.

For example, if int a and b have values between 0 and 7 (000 and 111 binary) then the result of xor of these two arguments will always be between 0 and 7 (as xor will change only 3 bits). Now when you do a multiplication and a sum you will have a lot better distribution as the values will not be within the 0 and 7 range.

edited Jun 25 '13 at 12:38

answered Jun 25 '13 at 12:19

Adam Siemion

15,569
7
58
92

By the way is int hashCode its value? It would be very bad for non-uniform distributions for most use-cases, which is bad for HashMap and other hash based algorithms. – Basilevs Jun 25 '13 at 12:25
Depends of the implementation ^^ but the answer is, sadly, often yes. – C4stor Jun 25 '13 at 12:28
@Basilevs Yes, I meant wider, better, fixed the answer, thanks. – Adam Siemion Jun 25 '13 at 12:41

score -1 · Answer 3 · answered May 28 '18 at 08:09

-1

The answer is (as always): "It depends." It depends on your class.

For example if you consider

class X {
    T a, b;
    X(T _a, _b) { a = _a; b = _b }
}

you would not use a symmetric operator like +, *, or ^ (Imagine T is int, and you are hashing X(1,2) and X(2,1). Obviously the hash code should be different. So the first of the three "solutions" (xor hash values) would be bad).

If T is a complex type, the third solution (Objects.hash()) would be possibly bad too, because only the references are considered (equal objects could return different hash codes).

answered May 28 '18 at 08:09

U. Windl

3,480
26
54

What is a complex type? Why would an equal object produce different hash code? – Basilevs May 28 '18 at 12:27
@Basilevs: A _complex_ type obviously is a non-primitive type, i.e.: a true _reference type_. I don't know why you downvote this when you don't understand what I wrote. – U. Windl Jun 25 '18 at 11:49
1. misuse of term "complex type" (which has no formal definiton in CS and may refer, for example to a complex number) 2. implied violation of hashCode() + equals() contract Where my understanding is lacking? – Basilevs Jun 25 '18 at 12:07
For me, especially when discussing a inconsistent language like Java, this is like splitting hairs: Whether _Atomic_ or intrinsic_ or _primitive_, it's all one part, while _complex_, _composite_ is the other one. In Eiffel there are just _expanded_ types and _reference_ types. And there are very clear contracts regarding equality and hash code, which are absent in Java (And I believe that's the reason for most mess in Java). – U. Windl Jun 25 '18 at 13:17
Most of all, "*If `T` is a complex type, the third solution (Objects.hash()) would be possibly bad too, because only the references are considered (equal objects could return different hash codes).*" says it all: Equal objects can have different references, that `Objects.hash(...)` considers. Thus when passing equal objects with different references, different hash codes may result. That's what I wrote, and I think it's correct. – U. Windl Jun 25 '18 at 13:17
3. [Objects.hash()](https://docs.oracle.com/javase/7/docs/api/java/util/Objects.html#hash(java.lang.Object...)) only hashes references for arrays , given there are no arrays in your example, this argument is not applicable. – Basilevs Jun 25 '18 at 13:46
More generally only objects that use default hashCode implementation are subject for identity hashing. Such objects are out of scope of this question. – Basilevs Jun 25 '18 at 15:06

Hashing composite objects

3 Answers3