2

The documentation for HashSet.add says

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

Since my code below will return false for e.equals(e2), I'd expect it to let me add the same instance twice. But the set only contains my instance once. Can someone explain why?

package com.sandbox;

import java.util.HashSet;
import java.util.Set;

public class Sandbox {

    public static void main(String[] args) {
        Set<A> as = new HashSet<A>();
        A oneInstance = new A();
        System.out.println(oneInstance.equals(oneInstance));    //this prints false
        as.add(oneInstance);
        as.add(oneInstance);
        System.out.println(as.size());  //this prints 1, I'd expect it to print 2 since the System.out printed false
    }

    private static class A {
        private Integer key;

        @Override
        public boolean equals(Object o) {
            if (!(o instanceof A)) {
                return false;
            }

            A a = (A) o;

            if (this.key == null || a.key == null) {
                return false;   //the key is null, it should return false
            }

            if (key != null ? !key.equals(a.key) : a.key != null) {
                return false;
            }

            return true;
        }

        @Override
        public int hashCode() {
            return key != null ? key.hashCode() : 0;
        }
    }

}
Daniel Kaplan
  • 62,768
  • 50
  • 234
  • 356

4 Answers4

11

HashSet (really HashMap under the hood) has an "optimization" in that it checks for object reference equality before calling the equals() method. since you are putting the same instance in twice, they are considered equal even though the equals() method does not agree.

Relevant line from HashMap.put():

        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
jtahlborn
  • 52,909
  • 5
  • 76
  • 118
  • @JackStraw It doesn't matter how many you assign, you'll get the same result. – Sotirios Delimanolis May 02 '13 at 17:22
  • 2
    @JackStraw - how does that change anything? copying a reference doesn't make it not the same instance. – jtahlborn May 02 '13 at 17:22
  • @jtahlborn The code given does not use HashMap.put(). It uses HashSet.add. Refer to Javadoc here: In HashSet.add if the element is already in the HashSet, it doesn't add it again. – Scott Shipp May 02 '13 at 18:15
  • 1
    @ScottShipp - `HashSet.add()` just calls `HashMap.put()` internally (as i said, HashSet is backed by HashMap). – jtahlborn May 02 '13 at 18:32
3

You're breaking the contract of Object.equals(Object), which starts with:

The equals method implements an equivalence relation on non-null object references:

  • It is reflexive: for any non-null reference value x, x.equals(x) should return true.

As your sample code says:

System.out.println(oneInstance.equals(oneInstance)); //this prints false

It seems that HashSet<E> (entirely reasonably) assumes reflexivity, and doesn't check for equality when it finds that the exact same object is already in the set, as an optimization. Therefore it will not even call your equals method - it considers that the object is already in the set, so doesn't add a second copy.

In particular, if x.equals(x) is false, then any containment check would also be useless.

I'd implement equals like this:

public boolean equals(Object o) {
    // Normal reflexive optimization
    if (this == o) {
        return true;
    }

    // "Correct type" check
    if (!(o instanceof A)) {
        return false;
    }
    
    A a = (A) o;

    // If both keys are null, the objects are equal. This is the most normal
    // approach; you *could* make non-identical objects with null keys non-equal,
    // but that would be odd.
    if (this.key == null && a.key == null) {
        return true;
    }

    // If exactly *one* key is null, the objects are not equal.
    if (this.key == null || a.key == null) {
        return false;
    }

    // By now we know that both keys are non-null; use normal equality.
    return this.key.equals(a.key);
}

Or if you're using Java 7:

public boolean equals(Object o) {
    // Normal reflexive optimization
    if (this == o) {
        return true;
    }

    // "Correct type" check
    if (!(o instanceof A)) {
        return false;
    }
    
    A a = (A) o;
    return Objects.equals(this.key, a.key);
}
Community
  • 1
  • 1
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I hope you don't hate me for this. I said earlier that I'd mark this as the answer if you wrote it. But now I think that @jtahlborn is giving a more accurate reason for why this isn't working. You're pointing out a legitimate problem, but I don't think it's the cause of what I'm seeing. – Daniel Kaplan May 02 '13 at 17:21
  • 1
    @tieTYT: My answer effectively *includes* that though: "It seems that HashSet (entirely reasonably) assumes reflexivity, and doesn't check for equality when it finds that the exact same object is already in the set, as an optimization." My answer shows that you're violating the contract, and then why that causes HashSet a problem. – Jon Skeet May 02 '13 at 17:23
2

Hash maps/tables work by taking an object and 'hashing' it with a 'hash' function to produce a Psuedo Random Uniformly Distributed unique id representing the object where said id can be used as a key into an indexable structure like an array. Ideally you would have a perfect hash where each unique item produces a unique indexable id.

Obviously your array is fixed in size (you can grow the array but this will dramatically affect runtime performance) so at some point, if you continue to add elements to the Hash map/table you will eventually get 2 items with the same hash code and then you will have a collision; this is where equals comes into play.

When this occurs equality is used to disambiguate WHICH key/value you are seeking by iterating through (usually by storing a LinkedList at the index position and not just the element) the available objects and checking the equals method.

So, the problem for your case is easy: If your hash implementation is wrong then HashSet (which is backed by HashMap) fails to find your object in it's table and thus never bothers to call equals (have a look at HashMap.get() to see their implementation).

Whatever you use in equals MUST be used in hashCode() if you want this to work and vice versa. If you implement equals() it's a damn good idea to implement hashCode(). If you implement hashCode() then you MUST implement equals for hashing to actually work.

Christian Bongiorno
  • 5,150
  • 3
  • 38
  • 76
  • You haven't said *how* the hash is invalid anywhere. It's more the equality test which seems invalid to me. – Jon Skeet May 02 '13 at 17:32
  • (In fact, the hash looks fine to me. It's the equals handling of null key values which is wrong.) And your description of a hash is wrong too. You describe it as a **unique** ID - which is absolutely not the case, and isn't feasible for this case. (Heck, a `hashCode` method which always returns a constant value is "correct" - just not efficient.) – Jon Skeet May 02 '13 at 17:42
  • @JonSkeet read carefully: It has to be a uniformly distributed id to be effective. If you return the same number every time, then you will have a collision on every add. Performance thusly becomes O(N) as it essentially is a LinkedList. As for the other point, I have corrected my above assertions. Others had mentioned the hash as being wrong so I explained why a wrong hash would appear to ignore equals. Thanks for the feedback – Christian Bongiorno May 03 '13 at 00:11
0

I don't know exactly why but I feel compelled to point out that when you implement equals, part of the contract for the equals method that you should uphold is that it is reflexive, meaning the same object equals itself. So your equals should return true.

My thought for an answer to your question is that the .equals() method is not getting called when you merely add the two items of the same instance to the HashSet. Probably only hashCode is called up to that point. Since hashCode returns the key it will return the same key each time and the item just gets hashed to the same place twice, leaving only one item in the set.

Scott Shipp
  • 2,241
  • 1
  • 13
  • 20
  • no, that is not how hashCodes are used. you can have many items in a HashSet with the same hashCode. – jtahlborn May 02 '13 at 17:14
  • Once there is a hash collision, the `equals()` method is used. – Sotirios Delimanolis May 02 '13 at 17:18
  • @SotiriosDelimanolis Under normal circumstances, yes. But not in this example. You can try it youself to see. – Daniel Kaplan May 02 '13 at 17:20
  • @tieTYT I'm just trying to clarify in case there was confusion, not related to your question. – Sotirios Delimanolis May 02 '13 at 17:21
  • So why did I get downvoted if my answer (.equals method is not getting called) was right? – Scott Shipp May 02 '13 at 18:04
  • @jtahlborn From the Java API: Adds the specified element to this set if it is not already present...If this set already contains the element, the call leaves the set unchanged and returns false. – Scott Shipp May 02 '13 at 18:11
  • You answer is getting downvoted because it implies that the hashCode determines inclusion in the HashSet, which it does not. – jtahlborn May 02 '13 at 18:38
  • Meaning what exactly? HashCode method is definitely used to store the value in the HashSet. Is that what you mean by "determines inclusion." So if it is true (which it is) why is that a problem exactly? If you go review the code when hashcodes are equal and objects are equal (your answer above states this part of it) the object in the collection gets overwritten with the current object. My answer seems to explain the OP's situation exactly. It's not a GENERAL CASE and I never claimed it was. – Scott Shipp May 02 '13 at 20:25
  • OK edited for crystal clarity. – Scott Shipp May 02 '13 at 20:49