How does the HashSet maintain the buckets? What data structure is used for this?

Question

When an element with a different hashCode is added to a HashSet, a new got to be added right? To what data structure would this new bucket be added? Does it again resort to some sort of array and resizes that each time a new element is added thus making the addition and deletion into the HashSet O(n) complex?

After reading a few posts, I got to know that some implementations of JDKs use HashMap as the backup collection for HashSet but then what that HashMap use for this?

Luke, use the source. http://www.docjar.com/html/api/java/util/HashSet.java.html http://www.docjar.com/html/api/java/util/HashMap.java.html — Matt Ball, Mar 08 '13 at 04:27
look at this thread http://stackoverflow.com/questions/9119840/how-do-hashsets-in-java-work — Rais Alam, Mar 08 '13 at 04:30

Thilo · Accepted Answer · 2019-02-18T10:32:47.700

You can always look at the source code.

And there you will see that HashMap has an array of buckets:

transient Entry[] table;

Every bucket is essentially a linked list:

static class Entry<K,V> implements Map.Entry<K,V> {
         final K key;
         V value;
         Entry<K,V> next;
         final int hash;

The array gives you constant-time access to the bucket for a given hash code, and then you have to loop through that list (which hopefully does not have more than one or two entries):

final Entry<K,V> getEntry(Object key) {
         int hash = (key == null) ? 0 : hash(key.hashCode());
         for (Entry<K,V> e = table[indexFor(hash, table.length)];
              e != null;
              e = e.next) {
             Object k;
             if (e.hash == hash &&
                 ((k = e.key) == key || (key != null && key.equals(k))))
                 return e;
         }
         return null;
}

When an element with a different hashCode is added to a HashSet, a new got to be added right?

When an element with the same hashCode as an existing one is added, it will go into the same bucket (at the end of a linked list).

When an element with a new hashCode is added, it may or may not go to a different bucket (because you have way more hashCodes than buckets).

All buckets are created in advance when the Map is sized. If the capacity limit is reached, it is resized with more buckets and everything gets put into new buckets.

To what data structure would this new bucket be added?

Buckets are not added. There is a fixed array of buckets. When you need more capacity, the whole structure is rebuilt with a bigger array.

Does it again resort to some sort of array and resizes that each time a new element is added thus making the addition and deletion into the HashSet O(n) complex?

Not each time. Ideally never. Only when you miscalculated the capacity and end up needing more. Then it gets expensive, as all is copied to a new array. This process is essentially the same as with ArrayList.

Source code link is not working – kAmol Feb 18 '19 at 08:50 — kAmol, Feb 18 '19 at 08:50

score 0 · Answer 2 · answered Mar 08 '13 at 04:31

A lot can be gleaned by even just reading the Javadoc for HashSet and HashMap. A HashSet is backed by a HashMap.

According to the HashMap Javadoc, it's defined by an initial capacity and load factor. The backing hash table won't be resized until the load factor is exceeded, so to answer one of your questions, no, a resize won't happen on every new addition/deletion from the map.

score 0 · Answer 3 · answered Mar 08 '13 at 04:39

HashMap uses an array of Map.Entry: an element in the array is a pair key,value.

When an element is inserted, the position of the bucket is calculated from the hash code. If the inserted key is different from the key that is already stored in a bucket (a hash-code collision), then the next empty bucket is chosen. This algorithm has the consequence that operations on a hash maps where the array is "almost full" will be rather expensive: indeed, they will be O(n) if there is only one free bucket.

In order to avoid this problem, HashMap automagically resizes when its current count is greater than some percentage of the internal array capacity (the "load factor", which by default is 75%). This means that a 75-element HashMap will be baked by a 100-element array. Decreasing the load factor will increase the memory overhead, but will bias the average execution order to nearly constant.

Note that worst-case insertion may still be O(n) if every element has the same hashCode.

"Note that worst-case insertion may still be O(n) if every element has the same hashCode." I think this property was used in the recent hashtable attacks against Java application servers: Construct query parameters that make a very unfortunate hashtable. — Thilo, Mar 08 '13 at 04:41

How does the HashSet maintain the buckets? What data structure is used for this?

3 Answers3