8

I had a doubt about the set in Collections framework. How the set itself will identify duplicates and how it will come to know? Could anyone please explain how it is implemented? How hashcode and equals method will come into the picture? I need a brief explanation as it is really important for me.

jatin_ghataliya
  • 144
  • 1
  • 1
  • 16
Kosuri Naresh
  • 103
  • 1
  • 2
  • 14

6 Answers6

8

This is explained in the javadoc for Set.

A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element.

Daniel Kaplan
  • 62,768
  • 50
  • 234
  • 356
  • But it doesn't clarify how it checks if the element is already present or not. – Ritveak Mar 23 '22 at 01:34
  • @Ritveak actually, it does, it's just so abstract it doesn't seem so. `equals` is something a developer implements, so that's as specific as it can get. An element is already present if it `equals` another element. `equals` is a complex topic on its own but it's worth understanding. I think Effective Java, a book, is a good place to learn, but if you read articles for a while you can probably get the same information. – Daniel Kaplan Mar 23 '22 at 22:41
  • It is fine, I dug deeper into the HashSet implementation and found that they are using hashmap inside by using the element as key. They are nowhere using equals though, they use hash. – Ritveak Mar 24 '22 at 02:14
  • @Ritveak Okay, but how does `HashMap` find elements by key? Keep in mind, my post isn't saying you need to *use* `equals` in your implementation. It simply *should be true* that "sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element." – Daniel Kaplan Mar 24 '22 at 07:29
8

It roughly works like this

if (!collection.contains(element))
    collection.add(element);

And the contains method, would use equals/hashcode.

In TreeSet, the elements are stored in a Red-Black Tree, whereas HashSet, uses a HashMap.

Infact, the way it is added to the container is specific to the element (the spot on the tree, bucket in the hashtable), thus the adding itself uses equals/hashcode.

Karthik T
  • 31,456
  • 5
  • 68
  • 87
4

The actual implementation depends on the container. HashMap lookup the item bucked given its hashCode then test the inserted object and the stored ones by using equals (this is one of the reasons for requiring that a.equals(b) iff b.equals(a)).

TreeMap, on the other hand, relies on the result of the compareTo method (if the element implements Comparable or the compare method implemented by a Comparator). If compare returns 0, the elements are regarded as "equals". (Note that compareTo should be consistent with equals, i.e. a.compareTo(b)==0 iff a.equals(b)).

Javier
  • 12,100
  • 5
  • 46
  • 57
  • Thanks for you reponse.. Can u please explain with these example. s.add("123");s.add("123"); Now both elements are same. How the hashcode will calcuate and how equal method will validate? I am pretty new to this concpets. – Kosuri Naresh Feb 25 '13 at 09:49
2

HashSet uses hashcode() toresolve bucket where object should go and equals() method to check equality on objects lying on that bucket

jmj
  • 237,923
  • 42
  • 401
  • 438
  • 1
    Do you mean first of all it will find the location as to where it would be placed and then it would compare using `equals` for a duplicate? Sounds analogous to cache memory hits.Is my interpretation correct? – Siddharth Kamaria Dec 24 '14 at 17:25
  • Did you get your query resolved? Also, how does it check if an element is present? using hashCode or value equals? – Ritveak Mar 23 '22 at 01:37
0

"Can u please explain with these example. s.add("123");s.add("123");"

For the above query explained in context with Set interface, Please refer to the below snippet and explanation.

public void setTest() {
    Set<String> obj = new HashSet<>();
    System.out.println(obj.add("123")); //Output : true
    System.out.println(obj.add("123")); //Output : false
}

If you notice in the above snippet, we have added 123 two times. for the first time add SOP will return true. then for the second time of added "123", SOP will return false.

With this we can understand that, if we add same value in the Set for the second time or more, then the duplicated value will be skipped.

0

Basically set is an interface which has many different implementations, let's take HashSet implementation for now, to answer you question, I downloaded the source code and went inside the HashSet class, then I searched add method and saw that it uses HashMap to store unique values. It uses the value to be stored as key of a HashMap and the corresponding value of key (i.e PRESENT in below code snippet) as a constant value(this value is a dummy value), we all know that keys of map are unique. So that is how it works. Code below:

private static final Object PRESENT = new Object();
public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}