while working on performance sensitive code I discovered that HashSet should offer constant time performance for its add, remove, contains and size methods as long as the hash-function returns unique hashes for the elements in the set (HashSet Javadoc).
When executing the following code, I discovered that this constant time performance for the add method does not hold true. The following code should round in roughly double the time compared with the runtime of executing it with numberOfELements = 10.000.000. But it needs roughly 3-4 times as long.
HashSet<Integer> myHashSet;
int numberOfElements;
//Warm-up code
numberOfElements = 10000000; //10 Mio Elements
myHashSet = new HashSet(540000000); //Create HashSet with large enough initial capacity to avoid resize operations
for (int i = 0; i < numberOfElements; i++){
myHashSet.add(i); //Add random element
}
//First benchmark
numberOfElements = 5000000; //5 Mio Elements
myHashSet.clear(); //Clear HashSet
long startTime = System.nanoTime(); //Save start time
for (int i = 0; i < numberOfElements; i++){
myHashSet.add(i); //Add elements
}
System.out.print("Set filled with " + myJournalPoC.SFA_Journal.size() + " records in ");
System.out.println((System.nanoTime() - startTime)/1000000000 + "s");
// Second benchmark
numberOfElements = 10000000; //10 Mio Elements
myHashSet.clear(); //Clear HashSet
long startTime = System.nanoTime(); //Save start time
for (int i = 0; i < numberOfElements; i++){
myHashSet.add(i); //Add elements
}
System.out.print("Set filled with " + myJournalPoC.SFA_Journal.size() + " records in ");
System.out.println((System.nanoTime() - startTime)/1000000000 + "s");
// Third benchmark
numberOfElements = 20000000; //20 Mio Elements
myHashSet.clear(); //Clear HashSet
long startTime = System.nanoTime(); //Save start time
for (int i = 0; i < numberOfElements; i++){
myHashSet.add(i); //Add elements
}
System.out.print("Set filled with " + myJournalPoC.SFA_Journal.size() + " records in ");
System.out.println((System.nanoTime() - startTime)/1000000000 + "s");
// Third benchmark
numberOfElements = 40000000; //40 Mio Elements
myHashSet.clear(); //Clear HashSet
long startTime = System.nanoTime(); //Save start time
for (int i = 0; i < numberOfElements; i++){
myHashSet.add(i); //Add elements
}
System.out.print("Set filled with " + myJournalPoC.SFA_Journal.size() + " records in ");
System.out.println((System.nanoTime() - startTime)/1000000000 + "s");
I'd really appreciate to know the root cause for this behavior.
EDIT: I added the code to measure performance in the example above.