Reason behind JVM's default Object.HashCode() implementation

Question

I am trying to understand why JVM's default implementation does not return same hashcode() value for all the objects...

I have written a program where i have overridden equals() but not hashCode(), and the consequences are scary.

HashSet is adding two objects even the equals are same.
TreeSet is throwing exception with Comparable implementation..

And many more..

Had the default Object'shashCode() implementation returns same int value, all these issues could have been avoided...

I understand their's alot written and discussed about hashcode() and equals() but i am not able to understand why things cant be handled at by default, this is error prone and consequences could be really bad and scary..

Here's my sample program..

import java.util.HashSet;
import java.util.Set;


public class HashcodeTest {

    public static void main(String...strings ) {

        Car car1 = new Car("honda", "red");
        Car car2 = new Car("honda", "red");
        Set<Car> set = new HashSet<Car>();

        set.add(car1);
        set.add(car2);
        System.out.println("size of Set : "+set.size());
        System.out.println("hashCode for car1 : "+car1.hashCode());
        System.out.println("hashCode for car2 : "+car2.hashCode());

    }

}

class Car{
    private String name;
    private String color;



    public Car(String name, String color) {
        super();
        this.name = name;
        this.color = color;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public String getColor() {
        return color;
    }
    public void setColor(String color) {
        this.color = color;
    }


    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Car other = (Car) obj;
        if (color == null) {
            if (other.color != null)
                return false;
        } else if (!color.equals(other.color))
            return false;
        if (name == null) {
            if (other.name != null)
                return false;
        } else if (!name.equals(other.name))
            return false;
        return true;
    }

}

Output:

size of Set : 2

hashCode for car1 : 330932989

hashCode for car2 : 8100393

If the default implementation of `hashCode` always returned the same value by default, you would get terrible performance from hash-based data structures. — Andy Turner, Feb 19 '16 at 10:02
Agree but it would be better than the wrong or incorrect functionality... — Rupesh, Feb 19 '16 at 10:04
Possible duplicate of [Why do I need to override the equals and hashCode methods in Java?](http://stackoverflow.com/questions/2265503/why-do-i-need-to-override-the-equals-and-hashcode-methods-in-java) — Andy Turner, Feb 19 '16 at 10:04
I believe you should be functionally correct before you look for the performance — Rupesh, Feb 19 '16 at 10:04
The default functionality *is* correct, though, *and* gives good performance in hash-based data structures. It is the fact that you have overridden `equals` without overriding `hashCode` consistently that makes it functionally incorrect. — Andy Turner, Feb 19 '16 at 10:06
@Andy, i dont think so, I understand the functionality but want to understand the "why?" part of it — Rupesh, Feb 19 '16 at 10:06
@Rupesh The default `hashCode()` implementation is designed to work with the default `equals(Object)` implementation. `equals(Object)` defaults to a test of object identity (e.g. `this == o` if `o` is the parameter). The default `hashCode()` implementation is similarly based on object identity, and will both work with the default `equals()` implementation and provide good performance. — Kevin, Feb 19 '16 at 22:42
Most classes don't need to override `equals(Object)` or `hashCode()`, because they can only be overriden meaningfully in value classes. If Object's `hashCode()` method simply returned a constant, essentially every class would have to override `hashCode()` to avoid pathological performance in hash tables. As it is, just by doing nothing, you get acceptable performance in hash tables. And, in the rare case you do forget to override `hashCode()`, the compiler should warn you. — Kevin, Feb 19 '16 at 22:46

Krzysztof Cichocki · Answer 1 · 2016-02-19T10:09:31.033

3

You broke the contract.

hashcode and equals should be written in such a way, that when equals return true these objects has same hashcode.

If you override equals then you must provide hashcode that works properly.

Default implementation can't handle it, because default implementation don't know which fields are important. And automatic implementation would not do it in efficient way, the hashcode function is to speed up operations like data lookup in data structures, if it is implemented improperly, then performance will suffer.

edited Feb 19 '16 at 10:09

answered Feb 19 '16 at 09:59

Krzysztof Cichocki

6,294
1
16
32

Yes that i understand and agree but i broke it unknowingly... why the default implementation cannot or should not handle it. – Rupesh Feb 19 '16 at 10:00
The contract that `Object` obeys by having `equals()` as `a == b` which guarantees that `a.hashCode() == b.hashCode()`. – Kayaman Feb 19 '16 at 10:01

Tagir Valeev · Accepted Answer · 2016-02-19T10:46:23.313

It seems that you want to propose to calculate hashCode by default just by taking all the object fields and combining their hashCodes using some formula. Such approach is wrong and may lead to many unpleasant circumstances. In your case it would work, because your object is very simple. But real life objects are much more complex. A few examples:

Objects are connected into double-linked list (every object has previous and next fields). How default implementation would calculate the hashCode? If it should check the fields, it will end up with infinite recursion.
Ok, suppose that we can detect infinite recursion. Let's just have single-linked list. In this case the hashCode of every node should be calculated from all the successor nodes? What if this list contains millions of nodes? All of them should be checked to generate the hashCode?
Suppose you have two HashSet objects. First is created like:
```
HashSet<Integer> a = new HashSet<>();
a.add(1);
```
The second is created like this:
```
HashSet<Integer> b = new HashSet<>();
for(int i=1; i<1000; i++) b.add(i);
for(int i=2; i<1000; i++) b.remove(i);
```
From user's point of view both contain only one element. But programmatically the second one holds big hash-table inside (like array of 2048 entries of which only one is not null), because when you added many elements, the hash-table was resized. In contrast, the first one holds small hash-table inside (e.g. 16 elements). So programmatically objects are very different: one has big array, other has small array. But they are equal and have the same hashCode, thanks to custom implementation of hashCode and equals.
Suppose you have different List implementations. For example, ArrayList and LinkedList. Both contain the same elements and from the user's point of view they are equal and should have the same hashCode. And they indeed equal and have the same hashCode. However their internal structure is completely different: ArrayList contains an array while LinkedList contains pointers to the objects representing head and tail. So you cannot just generate the hashCode based on their fields: it surely will be different.
Some object may contain the field which is lazily initialized (initialized to null and calculated from other fields only when necessary). What if you have two otherwise equal objects and one has its lazy field initialized while other is not? We should exclude this lazy field from hashCode calculation.

So, there are many cases when universal hashCode approach would not work and may even produce problems (like making your program crash with StackOverflowError or stuck enumerating all the linked objects). Due to this the simplest implementation was selected which is based on object identity. Note that the contract of hashCode and equals requires them to be consistent, and it's fulfilled by default implementation. If you redefine equals, you just must redefine hashCode as well.

you just nailed it, thanks for sharing your analysis. – Rupesh Feb 19 '16 at 10:46 — Rupesh, Feb 19 '16 at 10:46

score 0 · Answer 3 · answered Feb 19 '16 at 10:02

From the Docs

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

score 0 · Answer 4 · answered Feb 19 '16 at 10:02

From documentation:

If two objects are equal according to the equals(Object) method, then calling the hashCode} method on each of the two objects must produce the same integer result.

then if you overrides how equals() behave, you must override hashCode() as well.

Also, from docs of equals() -

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

score 0 · Answer 5 · answered Feb 19 '16 at 10:11

From javadoc of Object class:

Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.

Thus if default implementation provides the same hash, it defeats the purpose.

And for a default implementation, it cannot assume all the classes are of value class, thus the last sentence from doc:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.

Reason behind JVM's default Object.HashCode() implementation

5 Answers5