What happens if we override only hashCode() in a class and use it in a Set?

Question

This may not be the real world scenario but just curious to know what happens, below is the code.

I am creating a set of object of class UsingSet. According to hashing concept in Java, when I first add object which contains "a", it will create a bucket with hashcode 97 and put the object inside it. Again when it encounters an object with "a", it will call the overridden hashcode method in the class UsingSet and it will get hashcode 97 so what is next?

As I have not overridden equals method, the default implementation will return false. So where will be the Object with value "a" be kept, in the same bucket where the previous object with hashcode 97 kept? or will it create new bucket? anybody know how it will be stored internally?

/* package whatever; // don't place package name! */

import java.util.*;
import java.lang.*;
import java.io.*;

class UsingSet {  

  String value;  

  public UsingSet(String value){  
    this.value = value;  
  }  

  public String toString() {  
    return value;  
  }  

  public int hashCode() {  
    int hash = value.hashCode();  
    System.out.println("hashcode called" + hash);  
    return hash;  
  }  

  public static void main(String args[]) {  

    java.util.Set s = new java.util.HashSet();  

    s.add(new UsingSet("A"));  
    s.add(new UsingSet("b"));  
    s.add(new UsingSet("a"));  
    s.add(new UsingSet("b"));   
    s.add(new UsingSet("a"));  

    s.add(new Integer(1));  
    s.add(new Integer(1));  

    System.out.println("s = " + s); 

  }  
}

output is:

hashcode called65
hashcode called98
hashcode called97
hashcode called98
hashcode called97
s = [1, b, b, A, a, a]

Internally a `HashMap` is used to store the values of a `HashSet`. Also, there is a method `hash()` under `HashMap` which applies a supplemental hash function to defend against poor hashCodes. — TheLostMind, Oct 10 '14 at 14:55

score 25 · Answer 1 · answered Apr 07 '16 at 04:03

HashCode & Equals methods

Only Override HashCode, Use the default Equals: Only the references to the same object will return true. In other words, those objects you expected to be equal will not be equal by calling the equals method.
Only Override Equals, Use the default HashCode: There might be duplicates in the HashMap or HashSet. We write the equals method and expect{"abc", "ABC"} to be equals. However, when using a HashMap, they might appear in different buckets, thus the contains() method will not detect them each other.

score 17 · Accepted Answer · edited May 23 '17 at 11:54

17

James Large answer is incorrect, or rather misleading (and part incorrect as well). I will explain.

If two objects are equal according to their equals() method, they must also have the same hash code. If two objects have the same hash code, they do NOT have to be equal too.

Here is the actual wording from the java.util.Object documentation:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

It is true, that if two objects don't have the same hash then they are not equal. However, hashing is not a way to check equality - so it is wildly incorrect to say that it is a faster way to check equality.

Also, it is also wildly incorrect to say the hashCode function is an efficient way to do anything. This is all up to implementation, but the default implementation for hashCode of a string is very inefficient as the String gets large. It will perform a calculation based on each char of the String, so if you are using large Strings as keys, then this becomes very inefficient; moreso if you have a large number of buckets.

In a Map (HashSet uses a HashMap internally), there are buckets and in each bucket is a linked list. Java uses the hashCode() function to find out which bucket it belongs in (it actually will modify the hash, depending on how many buckets exist). Since two objects may share the same hash, it will iterate through the linked list sequentially next, checking the equals() method to see if the object is a duplicate. Per the java.util.Set documenation:

A collection that contains no duplicate elements.

So, if its hashCode() leads it to a bucket, in which that bucket contains an Object where the .equals() evaluates to true, then the previous Object is overwritten with the new Object. You can probably view here for more information: How does a Java HashMap handle different objects with the same hash code?

Generally speaking though, it is good practice that if you overwrite the hashCode function, you also overwrite the equals function (if I'm not mistaken, this breaks the contract if you choose not to).

edited May 23 '17 at 11:54

Community

1
1

answered Oct 10 '14 at 15:16

searchengine27

1,449
11
26

You have mentioned that "if its hashCode() leads it to a bucket, in which that bucket contains an Object where the .equals() evaluates to true, then the previous Object is overwritten with the new Object". But i want to know, what will happen "if its hashCode() leads it to a bucket, in which that bucket contains an Object where the .equals() evaluates to (false)". I want to know how the two objects will be stored in the same bucket internally? – kumar Oct 13 '14 at 17:27
1

Well as I said, each bucket is a linked list. The hashCode is truncated with respect to the number of buckets to make it more efficient in conjunction to the number of elements stored. That means the smaller the number of elements stored, the more likely you will receive a collision. When the map find the bucket belonging to a hash it will iterate through each element in the linked list. If it gets to the end of the linked list and there are no matching elements, it will put the new element at the end. If the map reaches its load factor, then it refactors the entire map and increases the – searchengine27 Oct 13 '14 at 17:41
number of buckets. This is a little off topic, but the load factor is by default 75%. Also, off topic again, this is why you want to set an initial capacity that is reflected with the number of total elements you think will be in the map, because it gets increasingly expensive to refactor the map as the number of elements stored gets larger and larger. You want to set the initial capacity just over the maximum number of elements you think will be stored to alleviate this cost later on. – searchengine27 Oct 13 '14 at 17:43
Re, "hashing is not a way to check equality...": You can save a lot of time by comparing hash codes before calling .equals(). That's exactly what HashMap.get(k) does in OpenJDK 7 (Look up the source code.) It always compares hash codes first, and if the hash codes of the given key and the key in the table are not equal, then it will not bother to call the .equals() method. That saves a lot of time, especially for immutable objects (e.g., Strings) that only compute their own hash code once, and then just return the same value on subsequent .hashCode() call. – Solomon Slow Oct 20 '14 at 00:35
Re "the default implementation for hashCode of a string is very inefficient", The first call to String.hashCode() computes a hash code based on every character of the string. And then it _saves_ it. Each subsequent call to s.hashCode() simply returns the saved value. For a String that is used as a HashMap key, the hashCode() method may be called many times. Of course, the _other_ string might be a new string each time, but computing the hash code (i.e., fetching the bytes from) one string still is cheaper than fetching the bytes from two strings in order to compare them. – Solomon Slow Oct 20 '14 at 00:40
Re "[HashMap] uses the hashCode() function to find out which bucket [the key] belongs in [...then] it will iterate through the linked list sequentially next, checking the equals() method": Actually, It will compare hashCodes before it calls equals(), and it won't call equals() if the hash codes don't match. This makes sense because two keys in the same bucket may have different hash codes. There are 2^32 possible hash codes, but any given HashMap probably has a much smaller number of buckets. – Solomon Slow Oct 20 '14 at 00:49
@ArunRaaj The first snippet I copied and pasted was right from the `java.lang.Object` documentation which dictates the contract. – searchengine27 Feb 06 '18 at 16:03

Hasnain Ali Bohra · Answer 3 · 2017-05-08T04:44:02.740

Simply you can Assume hashcode and equals methods as a 2D search like:-

Where Hashcode is the Rows and the object list is the Column. Consider the following class structure.

public class obj
  {
    int Id;
    String name;
    public obj(String name,int id)
     {
         this.id=id;
         this.name=name;
     }
  }

now if you create the objects like this:-

obj obj1=new obj("Hassu",1);
obj obj2=new obj("Hoor",2);
obj obj3=new obj("Heniel",3);
obj obj4=new obj("Hameed",4);
obj obj5=new obj("Hassu",1);

and you place this objects in map like this :-

    HashMap hMap=new HashMap();
   1. hMap.put(obj1,"value1");
   2. hMap.put(obj2,"value2");
   3. hMap.put(obj3,"value3");
   4. hMap.put(obj4,"value4");
   5. hMap.put(obj5,"value5");

now if you have not override the hashcode and equals then after putting all the objects till line 5 if you put obj5 in the map as By Default HashCode you get different hashCode so the row(Bucket will be different). So in runtime memory it will be stored like this.

|hashcode   | Objects   
|-----------| --------- 
|000562     | obj1 
|000552     | obj2 
|000588     | obj3
|000546     | obj4
|000501     | obj5

Now if you create the same object Like :- obj obj6 = new obj("hassu",1); And if you search for this value in the map.like

if(hMap.conaints(obj6)) 
or 
hMpa.get(obj 6);

though the key(obj1) with the same content is available you will get false and null respectively. Now if you override only equals method. and perform the same content search key will also get the Null as the HashCode for obj6 is different and in that hashcode you wont find any key. Now if you override only hashCode method.

You will get the same bucket (HashCode row) but the content cant be checked and it will take the reference checked implementation by Super Object Class. SO here if you search for the key hMap.get(obj6) you will get the correct hashcode:- 000562 but as the reference for both obj1 and obj6 is different you will get null.

Solomon Slow · Answer 4 · 2014-10-20T03:21:16.180

Without looking at your code...

The whole point of hash codes is to speed up the process of testing two objects for equality. It can be costly to test whether two large, complex objects are equal, but it is trivially easy to compare their hash codes, and hash codes can be pre-computed.

The rule is: If two objects don't have the same hash code, that means they are not equal. No need to do the expensive equality test.

So, the answer to the question in your title: If you define an equals() method that says object A is equal to object B, and you define a hashCode() method that says object A is not equal to object B (i.e., it says they have different hash codes), and then you hand those two objects to some library that cares whether they are equal or not (e.g., if you put them in a hash table), then the behavior of the library is going to be undefined (i.e., probably wrong).

Added information: Wow! I really missed seeing the forest for the trees here---thinking about the purpose of hashCode() without putting it in the context of HashMap. If m is a Map with N entries, and k is a key; what is the purpose of calling m.get(k)? The purpose, obviously, is to search the map for an entry whose key is equal to k.

What if hash codes and hash maps had not been invented? Well the best you could do, assuming that the keys have a natural, total order, is to search a TreeMap, comparing the given key for equality with O(log(N)) other keys. In the worst case, where the keys have no order, you would have to compare the given key for equality with every key in the map until you either find a match or tested them all. In other words, the complexity of m.get(k) would be O(N).

When m is a HashMap, the complexity of m.get(k) is O(1), whether the keys can be ordered or not.

So, I messed up by saying that the point of hash codes was to speed up the process of testing two objects for equality. It's really about testing an object for equality with a whole collection of other objects. That's where comparing hash codes doesn't just help a little; It helps by orders of magnitude...

...If the k.hashCode() and k.equals(o) methods obey the rule: j.hashCode()!=k.hashCode() implies !j.equals(k).

I recommend revising your answer or removing it. Most of what you said is wrong, if not all of what you said. EDIT: yes, all of what you said is wrong. — searchengine27, Oct 19 '14 at 17:48
@searchengine27, OK, so you're saying that it's never costly to compare large, complex objects for equality, that it's always hard to compare hash codes, and that it's never possible/useful for an object to cache its hash code after computing it once. You are saying that it's OK for two objects that have different hash codes to test as equal, and that if you put objects like that into a HashTable, the HashTable will do something predictable and useful with them. Or, I guess, maybe, "all of what you said is wrong" could have some new meaning that I haven't heard about yet. — Solomon Slow, Oct 19 '14 at 23:52
read my answer I posted. I wanted to address *some* things you mentioned in my answer, but I inadvertently addressed everything wrong about your answer in my answer. The summation is: you need to read the documentation. As far as comparing equality of a string, it will compare each char in the string which is a single operation per char. The hashCode will perform "h = 31*h + val[off++];" (which is 5 operations) on each char. That means, hashCode is ALWAYS 5 times more expensive than the equals. This is irrelevant anyway, because the documentation doesn't say that equal hashCodes mean equality. — searchengine27, Oct 20 '14 at 00:04
I just looked up the source code for String.hashCode() in OpenJDK 7. The first time you call it, it does what you said, but then it also saves the hash value in an instance variable. On each subsequent call, it just returns the saved value. Trivial. Oh, and I did not say that equal hashCodes are supposed to imply equality: I said that _unequal_ hash codes are supposed to imply _non_ equality. — Solomon Slow, Oct 20 '14 at 00:23
@searchengine27 re "5 operations". The way you're counting, String.equals() does not do _one_ operation per character. It does five: It fetches one char from each array, it increments two indices, and it compares the chars. But "operations" is not the right thing to count: Fetches cost a lot more than adds, increments, and multiplications. What you _should_ be pointing out is that `s.hashCode()` has to examine the _whole string_, but `s.equals(t)` stops as soon as it finds a difference (e.g., it will fetch only one char from each string if the first chars are different) — Solomon Slow, Oct 20 '14 at 03:44

score 0 · Answer 5 · answered Oct 11 '14 at 18:42

0

Set will behave differently.

Uniqueness wont happen. Because unique will be achieved by both hashcode and equals methods.
output will be liked this s = [A, a, b, 1] instead of early one.

Apart that remove and contains all wont work.

answered Oct 11 '14 at 18:42

Siva Kumar

1,983
3
14
26

What happens if we override only hashCode() in a class and use it in a Set?

5 Answers5

Linked