3

I recently ran across a problem on leetcode which I solved with a nested hashset. This is the problem, if you're interested: https://leetcode.com/problems/group-anagrams/.

My intuition was to add all of the letters of each word into a hashset, then put that hashset into another hashset. At each iteration, I would check if the hashset already existed, and if it did, add to the existing hashset.

Oddly enough, that seems to work. Why do 2 hashsets share the same hashcode if they are different objects? Would something like if(set1.hashCode() == set2.hashCode()) doStuff() be valid code?

Tydal
  • 75
  • 4
  • I don't know why @Kaan deleted their answer. The requirement for hashCode() states that if two objects are equal, they must have the same hashCode. The sets are equal. – daniu Jul 20 '22 at 05:30
  • I've tried adding the same array to two different sets and printing their hashcodes, the values are the same. I even tried changing the values in the array, the hashcodes are still the same. – Arun Sudhakaran Jul 20 '22 at 05:32
  • 1
    @daniu deleted because I misread the original question. ;-) I just posted an answer though that should address the question. – Kaan Jul 20 '22 at 05:50
  • 1
    In the case of HashSet (others, too, but didn't verify whether _all_ implementations of Set), the result of `hashCode()` is calculated by adding up the hashCode of each object in the set. It doesn't matter the order / sequence, it's just addition. So if two sets each contain object hashCodes which sum up to X, then the hashCode() for the sets would be the same. – Kaan Jul 20 '22 at 05:52
  • 2
    Please clarify what you mean for the code to be valid? If this code is meant to be an equality check, then it is certainly not valid. `hashCode()` does not guarantee that 2 objects with the same hash code are equal. – Chaosfire Jul 20 '22 at 06:59
  • 1
    `hashCode` and `equals` should go by the internal content. Where equal _hash codes_ still might happen for _unequal_ objects. – Joop Eggen Jul 20 '22 at 14:10

2 Answers2

6

This is expected. HashSet extends AbstractSet. The hashCode() method in AbstractSet says:

Returns the hash code value for this set. The hash code of a set is defined to be the sum of the hash codes of the elements in the set, where the hash code of a null element is defined to be zero. This ensures that s1.equals(s2) implies that s1.hashCode()==s2.hashCode() for any two sets s1 and s2, as required by the general contract of Object.hashCode.

This implementation iterates over the set, calling the hashCode method on each element in the set, and adding up the results.

Here's the code from AbstractSet:

public int hashCode() {
    int h = 0;
    Iterator<E> i = iterator();
    while (i.hasNext()) {
        E obj = i.next();
        if (obj != null)
            h += obj.hashCode();
    }
    return h;
}

Why do 2 hashsets share the same hashcode if they are different objects?

With HashSet, the hashCode is calculated using the contents of the set. Since it's just numeric addition, the order of addition doesn't matter – just add them all up. So it makes sense that you have two sets, each containing objects which are equivalent (and thus should have matching hashCode() values), and then the sum of hashCodes within each set is the same.

Would something like if(set1.hashCode() == set2.hashCode()) doStuff() be valid code?

Sure.

EDIT: The best way of comparing two sets for equality is to use equals(). In the case of AbstractSet, calling set1.equals(set2) would result in individual calls to equals() at the level of the objects within the set (as well as some other checks).

Kaan
  • 5,434
  • 3
  • 19
  • 41
  • That is so elegant. It makes sense that the hashCodes of the elements could be added to construct the final hashCode. It would do that calculation efficiently, too. – Tydal Jul 20 '22 at 06:56
  • Well, it's not completely safe to use `hasCode`. As Chaosfire and Joop Eggen already said in the comments [⁽¹⁾](https://stackoverflow.com/questions/73046230/why-do-two-different-hashsets-with-the-same-data-have-the-same-hashcode#comment129014125_73046230) [⁽²⁾](https://stackoverflow.com/questions/73046230/why-do-two-different-hashsets-with-the-same-data-have-the-same-hashcode#comment129024449_73046230), while `hashCode` returns the same value for equal objects, they *may* also return the same value for *unequal* objects. Why don't you just use `equals`? – MC Emperor Jul 20 '22 at 14:13
  • In my original answer, re: the question _"would something like X be valid code?"_, I was narrowly answering whether it would be valid or not, not whether it's the right or best way. It's an important point to clarify that the right way to compare sets is to use `equals()` (and I just made a small addition at the end of my answer to include that). – Kaan Jul 20 '22 at 17:51
2

Why do two different HashSets with the same data have the same HashCode?

Actually this is needed to fulfill another need that is specified in Java.

The equals method of Set is overridden to take in consideration that equals returns true (example a.equals(b)) if:

  • a is of type Set and b is of type Set.
  • both a and b have exactly the same size.
  • a contains all elements of b.
  • b contains all elements of a.

Since the default equals (which compares only the memory reference to be the same) is overridden for Set, according to java guidelines the hashCode method has to be overridden as well. So, this custom implementation of hashCode is provided in order to match with the custom implementation of equals.

In order to see why it is necessary to override hashCode method when the equals method is overridden, you can take a look at this previous answer of mine.

Why do 2 hashsets share the same hashcode if they are different objects

Because as explained above this is needed so that Set can have the custom functionality for equals that it currently has.

If you want to just check if a and b are different instances of set you can still check this with operators == and !=.

a == b -> true means a and b point to the same instance of Set in memory

a != b -> true means a and b point to different instances of Set in memory

Panagiotis Bougioukos
  • 15,955
  • 2
  • 30
  • 47