-3

Consider the employee class -

public class Employer implements Serializable{

  private Long id;
  private String name;

  @Override
  public boolean equals(Object obj) {

    if (obj == null)
        return false;
    if (obj instanceof Employer) {
        Employer employer = (Employer) obj;
        if (this.id == employer.id) {
            return true;
        } 
    }
    return false;
  }

  //Idea from effective Java : Item 9
  @Override
  public int hashCode() {
    int result = 17;
    result = 31 * result + id.hashCode();
    //result = 31 * result + name.hashCode();
    return result;
  }
}

With 2 employee objects created -

Employer employer1 = new Employer();
employer1.setId(10L);

Employer employer2 = new Employer();
employer2.setId(11L);

After adding them to the hashset, the size will be 2. HashSet internally uses a hashmap to maintain the uniqueness-

private transient HashMap<E,Object> map;
public boolean add(E e) {
        return map.put(e, PRESENT)==null;
}

Now, if I set the id for the second employee to be same as that of the first, i.e-

employer2.setId(10L);

the size still remains 2. Why is it not 1? Does the in-variants get destroyed?

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
Farhan stands with Palestine
  • 13,890
  • 13
  • 58
  • 105

2 Answers2

8

the size still remains 2. Why is it not 1? Does the in-variants get destroyed?

If you modify any of the properties used to compute hashCode and equals for an instance already in the HashSet, the HashSet implementation is not aware of that change.

Therefore it will keep the two instances, even though they are now equal to each other.

You shouldn't make such updates for instances that are members or HashSets (or keys in HashMaps). If you must make such changes, remove the instance from the Set before mutating it and re-add it later.

Eran
  • 387,369
  • 54
  • 702
  • 768
8

All hash-based containers, including HashSet<T>, make a very important assumption about hash code of their keys: they assume that hash code never changes while the object is inside the container.

Your code violates this assumption by modifying the instance while it is still in the hash set. There is no practical way for HashSet<T> to react to this change, so you must pick one of two ways to deal with this issue:

  • Never modify keys of hash-based containers - This is by far the most common approach, often achieved by making hash keys immutable.
  • Keep track of modifications, and re-hash objects manually - essentially, your code makes sure that all modifications to hash keys happen while they are outside containers: you remove the object from the container, make modifications, and then put it back.

The second approach often becomes a source of maintenance headaches. When you need to keep mutable data in a hash-based container, a good approach is to use only final fields in the computation of your hash code and equality checks. In your example this would mean making id field final, and removing setId method from the class.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 1
    Exactly. I came to know after reading the same in Item39: Effective Java - For example, if you are considering using a client-provided object reference as an element in an internal Set instance or as a key in an internal Map instance, you should be aware that the invariants of the set or map would be destroyed if the object were modified after it is inserted. – Farhan stands with Palestine Jan 01 '18 at 14:04
  • Eventually I'm going to figure out what drives you to answer the same question over and over again. Until then, happy 2018! – Sotirios Delimanolis Jan 01 '18 at 15:22
  • 1
    @SotiriosDelimanolis Why bother figuring out when you can ask? The reason is very simple - I forget most of my answers as soon as I am done typing them. If I think I may have answered the question before, I do a quick search to see if I can find something relevant. Otherwise, I type up an answer. Happy New Year! – Sergey Kalinichenko Jan 01 '18 at 15:58