Problem:
I need to compare 2 hash table implementations (well basically
HashMap
with another one) and make a reasonable conclusion.I am not interested in 100% accuracy but just being in the right direction in my estimation.
I am interested in the difference not only per operation but mainly on the hashtable as a "whole".
I don't have a strict requirement on speed so if the other implementation is reasonably slower I can accept it but I do expect/require that the memory usage be better (since one of the hashtables is backed by primitive table).
What I did so far:
Originally I created my own custom "benchmark" with loops and many calls to hint for gc to get a feeling of the difference but I am reading online that using a standard tool is more reliable/appropriate.
Example of my approach (MapInterface is just a wrapper so I can switch among implementations.):
int[] keys = new int[10000000];
String[] values = new String[10000000];
for(int i = 0; i < keys.length; ++i) {
keys[i] = i;
values[i] = "" + i;
}
if(operation.equals("put", keys, values)) {
runPutOperation(map);
}
public static long[] runOperation(MapInterface map, Integer[] keys, String[] values) {
long min = Long.MAX_VALUE;
long max = Long.MIN_VALUE;
long run = 0;
for(int i = 0; i < 10; ++i) {
long start = System.currentTimeMillis();
for(int i = 0; i < keys.length; ++i) {
map.put(keys[i], values[i]);
}
long total = System.currentTimeMillis() - start;
System.out.println(total/1000d + " seconds");
if(total < min) {
min = time;
}
if(total > max) {
max = time;
}
run += time;
map = null;
map = createNewHashMap();
hintsToGC();
}
return new long[] {min, max, run};
}
public void hintsToGC() {
for(int i = 0; i < 20; ++i) {
System.out.print(". ");
System.gc();
try {
Thread.sleep(100);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
private HashMapInterface<String> createNewHashMap() {
if(jdk) {
return new JDKHashMapWrapper<String>();
}
else {
return new AlternativeHashMapWrapper<String>();
}
}
public class JDKHashMapWrapper implements HashMapInterface<String> {
HashMap<Integer, String> hashMap;
JDKHashMapWrapper() {
hashMap = new HashMap<Integer, String>();
}
public String put(Integer key, String value) {
return hashMap.put(key, value);
}
//etc
}
(I want to test put
, get
, contains
and the memory utilization)
Can I be sure by using my approach that I can get reasonable measurements?
If not what would be the most appropriate tool to use and how?
Update:
- I also test with random numbers (also ~10M random numbers) using SecureRandom.
- When the hash table resizes I print the logical size of the hash table/size of the actual table to get the load factor
Update:
For my specific case, where I am interested also in integers what can of pitfalls are there with my approach?
UPDATE after @dimo414 comments:
Well at a minimum the hashtable as a "whole" isn't meaningful
I mean how the hashtable behaves under various loads both at runtime and in memory consumption.
Every data structure is a tradeoff of different methods
I agree. My trade-off is an acceptable access penalty for memory improvement
You need to identify what features you're interested in verifying
1) put(key, value);
2) get(key, value);
3) containsKey(key);
4) all the above when having many entries in the hash table