I started benchmarking ways to find key by value in Guava cache and I noticed strange behaviour related to concurrency level. I'm not sure if that's a bug or undefined behaviour or maybe even expected but not specified.
My benchmark is supposed to find key by value in Guava Cache, not an usual thing to do, I know.
That's my complete benchmark class:
@Fork(4)
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 1, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 4, time = 100, timeUnit = TimeUnit.MILLISECONDS)
public class ValueByKey {
private Long counter = 0L;
private final int MAX = 2500;
private final LoadingCache<String, Long> stringToLong = CacheBuilder.newBuilder()
.concurrencyLevel(1)
.maximumSize(MAX + 5)
.build(new CacheLoader<String, Long>() {
public Long load(String mString) {
return generateIdByString(mString);
}
});
private final Map<String, Long> mHashMap = new Hashtable<>(MAX);
private final Map<String, Long> concurrentHashMap = new ConcurrentHashMap<>(MAX);
@Setup(Level.Trial)
public void setup() {
// Populate guava cache
for(int i = 0; i <= MAX; i++) {
try {
stringToLong.get(UUID.randomUUID().toString());
} catch (ExecutionException e) {
e.printStackTrace();
System.exit(1);
}
}
}
@Benchmark
public String stringToIdByIteration() {
Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);
for(Map.Entry<String, Long> entry : stringToLong.asMap().entrySet()) {
if(Objects.equals(randomNum, entry.getValue())) {
return entry.getKey();
}
}
System.out.println("Returning null as value not found " + randomNum);
return null;
}
@Benchmark
public String stringToIdByIterationHashTable() {
Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);
for(Map.Entry<String, Long> entry : mHashMap.entrySet()) {
if(Objects.equals(randomNum, entry.getValue())) {
return entry.getKey();
}
}
System.out.println("Returning null as value not found " + randomNum);
return null;
}
@Benchmark
public String stringToIdByIterationConcurrentHashMap() {
Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);
for(Map.Entry<String, Long> entry : concurrentHashMap.entrySet()) {
if(Objects.equals(randomNum, entry.getValue())) {
return entry.getKey();
}
}
System.out.println("concurrentHashMap Returning null as value not found " + randomNum);
return null;
}
private Long generateIdByString(final String mString) {
mHashMap.put(mString, counter++);
concurrentHashMap.put(mString, counter);
return counter;
}
}
What I noticed is when I change .concurrencyLevel(1)
to number different than 1 I'm starting to lose data. The following output is from concurrency level 4:
Iteration 1: Returning null as value not found 107
Returning null as value not found 43
Returning null as value not found 20
Returning null as value not found 77
Returning null as value not found 127
Returning null as value not found 35
Returning null as value not found 83
Returning null as value not found 43
Returning null as value not found 127
Returning null as value not found 107
Returning null as value not found 83
Returning null as value not found 82
Returning null as value not found 40
Returning null as value not found 58
Returning null as value not found 127
Returning null as value not found 114
Returning null as value not found 119
Returning null as value not found 43
Returning null as value not found 114
Returning null as value not found 18
Returning null as value not found 58
66.778 us/op
I noticed I never lose any data when using HashMap
or HashTable
for using the same code, it also performs much better:
Benchmark Mode Cnt Score Error Units
ValueByKey.stringToIdByIteration avgt 16 58.637 ± 15.094 us/op
ValueByKey.stringToIdByIterationConcurrentHashMap avgt 16 16.148 ± 2.046 us/op
ValueByKey.stringToIdByIterationHashTable avgt 16 11.705 ± 1.095 us/op
Is my code wrong or is that Guava can't properly handle partitioned HashTable with concurrency level higher than 1?
- The concurrency level option is used to partition the table internally such that updates can occur without contention.
- The ideal setting would be the maximum number of threads that could potentially access the cache at one time.