8

The HashSet class has an add(Object o) method, which is not inherited from another class. The Javadoc for that method says the following:

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

In other words, if two objects are equal, then the second object will not be added and the HashSet will remain the same. However, I've discovered that this is not true if objects e and e2 have different hashcodes, despite the fact that e.equals(e2). Here is a simple example:

import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;

public class BadHashCodeClass {

    /**
     * A hashcode that will randomly return an integer, so it is unlikely to be the same
     */
    @Override
    public int hashCode(){
        return new Random().nextInt();
    }

    /**
     * An equal method that will always return true
     */
    @Override
    public boolean equals(Object o){
        return true;
    }

    public static void main(String... args){
        HashSet<BadHashCodeClass> hashSet = new HashSet<>();
        BadHashCodeClass instance = new BadHashCodeClass();
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Elements in hashSet: " + hashSet.size());

        Iterator<BadHashCodeClass> iterator = hashSet.iterator();
        BadHashCodeClass e = iterator.next();
        BadHashCodeClass e2 = iterator.next();
        System.out.println("Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): " + (e==null ? e2==null : e.equals(e2)));
    }

The results from the main method are:

Instance was added: true
Instance was added: true
Elements in hashSet: 2
Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): true

As the example above clearly shows, HashSet was able to add two elements where e.equals(e2).

I'm going to assume that this is not a bug in Java and that there is in fact some perfectly rational explanation for why this is. But I can't figure out what exactly. What am I missing?

Thunderforge
  • 19,637
  • 18
  • 83
  • 130
  • Do not listen to the ranting of madmen; whatever happens in this case is just the screaming of a broken system. You should only worry about correcting the hashset problems in your program – Richard Tingle Jan 06 '14 at 07:01
  • Whatever behaviour you observe here can change at any time without warning because it is the response of a broken contract which is thereby undocumented and there is no obligation on future versions of java to maintain the behaviour. Relying on this behaviour would be a serious mistake – Richard Tingle Jan 06 '14 at 07:06
  • 1
    "On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question." ~Charles Babbage – dimo414 Jan 06 '14 at 07:07
  • @dimo414 Babbage's quote is nice, but if it's not clear why the figures are wrong, then there is still a rightful confusion. Indeed, the instructions for the machine said the figures I was putting in were absolutely right, without reminding me that it was relying on the assumption that some other figures must be right as well. – Thunderforge Jan 06 '14 at 07:10
  • @Thunderforge you put wrong numbers in earlier in your "calculation", those wrong numbers were later used by hashset, hence problems – Richard Tingle Jan 06 '14 at 07:12
  • 1
    @Thunderforge Incorrect, your definition of `equals()` is *fundamentally broken* if it does not correspond with the definition of `hashCode()`. – dimo414 Jan 06 '14 at 07:12
  • 1
    @Thunderforge what do you program in; most good IDEs will warn you of a broken equals/hashcode contract: I know netbeans does (or at least when you've overridden one without the one) – Richard Tingle Jan 06 '14 at 07:21
  • @RichardTingle That's awesome to know! Sadly, it doesn't look like Eclipse does. You can turn on a warning if one is implemented and not the other, but not if they will return different things. – Thunderforge Jan 06 '14 at 07:24
  • @Thunderforge now that I think about it I think netbeans is the same; checks you've done something; not it's correctness (failing to do anything is the usual mistake) – Richard Tingle Jan 06 '14 at 07:25
  • See also https://stackoverflow.com/questions/14504901/when-does-hashset-add-method-calls-equals – Raedwald Nov 21 '18 at 08:56

5 Answers5

23

I think what you're really trying to ask is:

"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?"

The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.

I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.


The equals() documentations states (emphasis added):

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

The contract between equals() and hashCode() isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b) implies a.hashCode() == b.hashCode() we can do some basic equivalence tests without needing to call equals() directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode() implies a.equals(b) will be false.

If you look at the code for HashMap (which HashSet uses internally), you'll notice an inner static class Entry, defined like so:

static class Entry<K,V> implements Map.Entry<K,V> {
  final K key;
  V value;
  Entry<K,V> next;
  int hash;
  ...
}

HashMap stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap to cache this value. By doing so, it only needs to call hashCode() once for each key in the map, as opposed to every time the key is inspected.

Now lets look at the implementation of put(), where we see these cached hashes being taken advantage of, along with the invariant above:

public V put(K key, V value) {
  ...
  int hash = hash(key);
  int i = indexFor(hash, table.length);
  for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
      // Replace existing element and return
    }
  }
  // Insert new element
}

In particular, notice that the conditional only ever calls key.equals(k) if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap are no longer true, and you will get back unusable results, including "duplicates" in your set.


Note that your claim "HashSet ... has an add(Object o) method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet, does not implement this method, the parent interface, Set, does specify the method's contract. The Set interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2)). As long as you follow the contracts, HashSet works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet cannot be expected to behave in any useful way.

Consider also that if you attempted to store objects in a TreeSet with an incorrectly implemented Comparator, you would similarly see nonsensical results. I documented some examples of how a TreeSet behaves when using an untrustworthy Comparator in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?

Community
  • 1
  • 1
dimo414
  • 47,227
  • 18
  • 148
  • 244
  • Thank you very much for taking the time to thoroughly explain not only the underlying assumptions I was neglecting, but also pointing out the real question I'm asking (and that I can get better responses by not claiming it's a bug in my question, even though it may be my first instinct). Also, you've done a great job explaining exactly why such underlying assumptions are important, Ultimately, I think that this answer is the most thorough and helpful for anyone wondering this question, so I've decided to award the answer to this. – Thunderforge Jan 06 '14 at 09:50
9

You've violated the contract of equals/hashCode basically:

From the hashCode() docs:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

and from equals:

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

HashSet relies on equals and hashCode being implemented consistently - the Hash part of the name HashSet basically implies "This class uses hashCode for efficiency purposes." If the two methods are not implemented consistently, all bets are off.

This shouldn't happen in real code, because you shouldn't be violating the contract in real code...

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 1
    Though `hashCode` will return different `hash code` but equals is always returning true. so isn't all the objects are equal? – eatSleepCode Jan 06 '14 at 06:59
  • Indeed, I discovered this because in my code two objects are equal, but somehow have different hashcodes. So "it shouldn't happen in real code" is nice, but the reality is that the Javadoc says it's supposed to work. – Thunderforge Jan 06 '14 at 07:06
  • @Thunderforge you have discovered a bug in your own code (bad hashcode function or equals function) not in java. Bugs in your code exist in real life; then you fix them – Richard Tingle Jan 06 '14 at 07:10
  • 1
    Violating the contracts in these methods is a critical bug, and until you resolve it you cannot hope to have any sort of reliable behavior. You'll need to figure out this "somehow" before you can go any further. – dimo414 Jan 06 '14 at 07:10
  • Right, but I was wondering why this behavior existed. I fixed my bug once I discovered it (and became aware of the hashcode contract), but I'm still puzzled as to why the Javadoc didn't clearly state that this. After all, it simply says that `(e==null ? e2==null : e.equals(e2))` and if that returns true, but the add method still lets you add, the user is really confused. – Thunderforge Jan 06 '14 at 07:11
  • @Thunderforge because the entire program is entirely "off the grid" in this case. There are probably **many** many other problems in your program that you haven't noticed yet due to the contract violation. The javadoc can't mention every condition where you've broken things elsewhere – Richard Tingle Jan 06 '14 at 07:15
  • Fortunately it was only with one class and I've always used auto-generated hashcode and equals otherwise. But yeah, I now realize that it can derail a program if that contract is broken. – Thunderforge Jan 06 '14 at 07:17
  • 2
    @Thunderforge: Well the Javadoc is assuming you're aware of the methods you're overriding - the Javadoc for `equals` and `hashCode` are clear, so you should have noticed when you overrode them - I wouldn't expect *every* class which uses `equals` and `hashCode` to contain a warning saying that if you've implemented them badly you're going to have a bad time. – Jon Skeet Jan 06 '14 at 07:46
2
@Override
public int hashCode(){
    return new Random().nextInt();
}

You are returning different has codes for same object every time it is evaluated. Obviously you will get wrong results.


add() function is as follows

public boolean add(E e) {
return map.put(e, PRESENT)==null;
}

and put() is

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

If you notice first has is calculated which is different in your case which is why object is added. equals() comes into picture only if hash are same for objects i.e collision has occured. Since in case hash are different equals() is never executed

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

Read more on what short circuiting is. since e.hash == hash is false nothing else is evaluated.

I hope this helps.

Aniket Thakur
  • 66,731
  • 38
  • 279
  • 289
0

because hashcode() is really implemented very badly,

it will try to equate in each random bucket on each add(), if you return constant value from hashcode() it wouldn't let you enter any

jmj
  • 237,923
  • 42
  • 401
  • 438
0

It is not required that hash codes be different for all elements! It is only required that two elements are not equal.

HashCode is used first to find the hash bucket the object should occupy. If hadhcodes are different, objects are assumed to be not equal. If hashcodes are equal, then the equals() method is used to determine equality. The use of hashCode is an efficiency mechanism.

And...
Your hash code implementation violates the contract that it should not change unless the objects identifying fields change.

Bohemian
  • 412,405
  • 93
  • 575
  • 722