A HashSet is backed by a HashMap. From it's JavaDoc:
This class implements the Set interface, backed by a hash table (actually a HashMap instance)
When taking a look at the source we can also see how they relate to each other:
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
Therefore a HashSet<E>
is backed by a HashMap<E,Object>
. For all HashSets in our application we have one reference object PRESENT
that we use in the HashMap
for the value. While the memory needed to store PRESENT
is neglectable, we still store a reference to it for each value in the map.
Would it not be more efficient to use null
instead of PRESENT
? A further consideration then is should we forgo the HashSet
altogether and directly use a HashMap
, given the circumstance permits the use of a Map
instead of a Set
.
My basic problem that triggered these thoughts is the following situation: I have a collection of objects on with the following properties:
- big collection of objects > 30'000
- Insertion order is not relevant
- Efficient check if an item is contained
- Adding new items to the collection is not relevant
The chosen solution should perform optimal in the context to the above criteria as well as minimize memory consumption. On this basis the datastructures
HashSet
andHashMap
spring to mind. When thinking about alternative approaches, the key question is:
How to check containement efficiently?
The only answer that comes to my mind is using the items hash to calculate the storage location. I might be missing something here. Are there any other approaches?
I had a look at various issues, that did shed some light on the issue, but not quietly answered my question:
- Java : HashSet vs. HashMap
- clarifying facts behind Java's implementation of HashSet/HashMap
- Java HashSet vs HashMap
I am not looking for suggestions of any alternative libraries or framework to address this, but I want to understand if there is an other way to think about efficient containement checking of an element in a Collection
.