Iterating on values from Guava Cache loses data

Question

I started benchmarking ways to find key by value in Guava cache and I noticed strange behaviour related to concurrency level. I'm not sure if that's a bug or undefined behaviour or maybe even expected but not specified.

My benchmark is supposed to find key by value in Guava Cache, not an usual thing to do, I know.

That's my complete benchmark class:

@Fork(4)
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 1, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 4, time = 100, timeUnit = TimeUnit.MILLISECONDS)
public class ValueByKey {

    private Long counter = 0L;

    private final int MAX = 2500;

    private final LoadingCache<String, Long> stringToLong = CacheBuilder.newBuilder()
        .concurrencyLevel(1)
        .maximumSize(MAX + 5)
        .build(new CacheLoader<String, Long>() {
            public Long load(String mString) {
                return generateIdByString(mString);
            }
        });

    private final Map<String, Long> mHashMap = new Hashtable<>(MAX);
    private final Map<String, Long> concurrentHashMap = new ConcurrentHashMap<>(MAX);

    @Setup(Level.Trial)
    public void setup() {
        // Populate guava cache
        for(int i = 0; i <= MAX; i++) {
            try {
                stringToLong.get(UUID.randomUUID().toString());
            } catch (ExecutionException e) {
                e.printStackTrace();
                System.exit(1);
            }
        }
    }

    @Benchmark
    public String stringToIdByIteration() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : stringToLong.asMap().entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("Returning null as value not found " + randomNum);
        return null;
    }

    @Benchmark
    public String stringToIdByIterationHashTable() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : mHashMap.entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("Returning null as value not found " + randomNum);
        return null;
    }

@Benchmark
    public String stringToIdByIterationConcurrentHashMap() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : concurrentHashMap.entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("concurrentHashMap Returning null as value not found " + randomNum);
        return null;
    }

    private Long generateIdByString(final String mString) {
        mHashMap.put(mString, counter++);
        concurrentHashMap.put(mString, counter);
        return counter;
    }

}

What I noticed is when I change .concurrencyLevel(1) to number different than 1 I'm starting to lose data. The following output is from concurrency level 4:

Iteration   1: Returning null as value not found 107
Returning null as value not found 43
Returning null as value not found 20
Returning null as value not found 77
Returning null as value not found 127
Returning null as value not found 35
Returning null as value not found 83
Returning null as value not found 43
Returning null as value not found 127
Returning null as value not found 107
Returning null as value not found 83
Returning null as value not found 82
Returning null as value not found 40
Returning null as value not found 58
Returning null as value not found 127
Returning null as value not found 114
Returning null as value not found 119
Returning null as value not found 43
Returning null as value not found 114
Returning null as value not found 18
Returning null as value not found 58
66.778 us/op

I noticed I never lose any data when using HashMap or HashTable for using the same code, it also performs much better:

Benchmark Mode Cnt Score Error Units ValueByKey.stringToIdByIteration avgt 16 58.637 ± 15.094 us/op ValueByKey.stringToIdByIterationConcurrentHashMap avgt 16 16.148 ± 2.046 us/op ValueByKey.stringToIdByIterationHashTable avgt 16 11.705 ± 1.095 us/op

Is my code wrong or is that Guava can't properly handle partitioned HashTable with concurrency level higher than 1?

The concurrency level option is used to partition the table internally such that updates can occur without contention.

The ideal setting would be the maximum number of threads that could potentially access the cache at one time.

I would suggest running same test against java ConcurrentHashMap to see if you can observe similar issues there. If yes, then it will be HashMap versus ConcurrentHashMap question (which I think is at heart of problem), without involving extra complexity of guava cache. — Artur Biesiadowski, Jan 26 '18 at 11:02
@ArturBiesiadowski Updated question with ConcurrentHashMap, behaves the same as HashTable and HashMap, doesn't drop any values. I can add I see the same behaviour when I iterate using stream or parallelStream -> filter -> limit -> reduce. — agilob, Jan 26 '18 at 11:21

diginoise · Accepted Answer · 2018-02-06T09:04:53.430

No cache guarantees cache-hits all of the time

Presence/absence of data in cache is determined by eviction policy (and data being loaded into cache in the first place).

Since you have used CacheBuilder.maximumSize(MAX + 5) your cache will use the size-based eviction and will start removing elements before it reaches the preset maximum size.

With concurrency level set to 4, Guava Cache plays it safe and and sets the threshold of eviction a bit lower, which makes sense as the elements can keep arriving as they are being evicted.

This is why your elements start 'disappearing'.

To test this, make your class implement RemovalListener interface:

public class ValueByKey implements RemovalListener<String, Long> { 
    //...
    @Override
    public void onRemoval(RemovalNotification<String, Long> notification) {
        System.out.println("removed: " + notification.getKey() + " -> " + notification.getValue());
    }
    //...
}

...and while running the tests, you will notice evictions which match missing values:

# Warmup Iteration   1: 
removed: 110c0a73-1dc3-40ee-8909-969e6dee0ea0 -> 3
removed: 6417015a-f154-467f-b3bf-3b95831ac5b7 -> 6
removed: 5bc206f9-67ec-49a2-8471-b386ffc03988 -> 14
removed: 3c0a33e1-1fe1-4e42-b262-bf6a3e8c53f7 -> 21
Returning null as value not found 14
Returning null as value not found 14
Returning null as value not found 3
64.778 us/op
Iteration   1: 
Returning null as value not found 21
Returning null as value not found 21
Returning null as value not found 6
37.719 us/op
[...]

I can imagine that the threshold calculation for eviction could be complex, but on my machine upping the maximum size by 5% (CacheBuilder.maximumSize(Math.round(MAX * 1.05))) prevented ALL evictions when running your benchmarks.

You can opt for time-based eviction (see `expireAfterAccess` and `expireAfterWrite` or for reference-based eviction, where it resembles weak or soft reference allowing for garbage collection to clean unused elements. If you want to keep all the elements forever, don't use cache. — diginoise, Jan 26 '18 at 13:36
Yes, I want to use `expireAfterAccess` with 30 minutes eviction (not in tests, but on prod). I thought I also wanted max 2500 elements in the cache, but apparently that's not the way to go. — agilob, Jan 26 '18 at 14:03
[Caffeine](https://github.com/ben-manes/caffeine) evicts after the threshold, so that might be another option. — Ben Manes, Jan 26 '18 at 18:06

Iterating on values from Guava Cache loses data

1 Answers1

No cache guarantees cache-hits all of the time