Bad performance with Guava Cache on Android

Question

We use a loading Google Guava LoadingCache for bitmaps in an Android application. In the application I am running a drawing Thread, that paints the bitmaps in the cache to a Canvas. If a specific bitmap is not in the cache, it does not get drawn so that no loading will ever block the drawing Thread.

However, the painting results in visual stuttering and the frames per second rate is not how we would like it. I nailed it down to the getIfPresent() method of the cache. That alone takes over 20% of the applications total CPU time. In getIfPresent() LocalCache$Segment.get() takes over 80% of the time:

Bear in mind, this is only a lookup of an already present bitmap. There will never happen a load in get(). I figured there would be a bookkeeping overhead in get() for the LRU queue that decides which eviction takes place if the segment is full. But this is at least an order of magnitude slower of what a Key-Lookup in LRU-LinkedHashmap.get() would give me.

We use a cache to get fast lookups if an element is in the cache, if the lookup is slow, there is no point in caching it. I also tried getAllPresent(a) and asMap() but it gives equal performance.

Library version is: guava-11.0.1.jar

LoadingCache is defined as follows:

LoadingCache<TileKey, Bitmap> tiles = CacheBuilder.newBuilder().maximumSize(100).build(new CacheLoader<TileKey,Bitmap>() {
            @Override
            public Bitmap load(TileKey tileKey) {
            System.out.println("Loading in " + Thread.currentThread().getName() + " "
                + tileKey.x + "-" + tileKey.y);

            final File[][] tileFiles = surfaceState.mapFile.getBuilding()
                .getFloors().get(tileKey.floorid)
                .getBackground(tileKey.zoomid).getTileFiles();
            String tilePath = tileFiles[tileKey.y][tileKey.x].getAbsolutePath();

            Options options = new BitmapFactory.Options();
            options.inPreferredConfig = Bitmap.Config.RGB_565;

            return BitmapFactory.decodeFile(tilePath, options);
            }
        });

My questions are:

Do I use it wrong?
Is its implementation unsuitable for Android?
Did i miss a configuration option?
Is this a known issue with the Cache that's being worked on?

Update:

After about 100 frames painted the CacheStats are:

I/System.out( 6989): CacheStats{hitCount=11992, missCount=97,
loadSuccessCount=77, loadExceptionCount=0, totalLoadTime=1402984624, evictionCount=0}

After that missCount stays basicly the same as hitCount increments. In this case the cache is big enough for loads to happen sparsely, but getIfPresent() is slow nontheless.

Please don't bold every other phrase; it was hard to read so I submitted an edit to take it out. — simchona, Feb 14 '12 at 19:39
Thank you, simchona for editing it to make it better readable. — user643011, Feb 14 '12 at 19:45
(This is probably worth sending to guava-discuss and/or filing an issue, by the way.) — Louis Wasserman, Feb 14 '12 at 19:51
Why do you use a cache for it? Wouldn't a simple array do the job as well? (Draw all items in an pre-allocated array that are not null) — Thomas Jungblut, Feb 14 '12 at 20:00
@Thomas Jungblut: We would like it to buffer tiles of different backgrounds zoom levels or backgrounds. We have a complex logic that that really gains from an LRU cache. I implemented it with guava, because it is flexible and provides variable weights for varyingly expensive elements. I am happy with it but the getIfPresent() cost is unexpected. — user643011, Feb 14 '12 at 20:07
Not sure how much that would help, but... What's the concurrency story? Can you play with the concurrencyLevel? — Etienne Neveu, Feb 14 '12 at 20:48
Louis, its not a bug. Many of the design decisions assume server-class machines, such as MESI-based cache coherency or that volatile reads / CAS are cheap if uncontended (on x86). ARM processors have very different trade-offs and are not high-performance multi-threading monsters. — Ben Manes, Feb 14 '12 at 21:19
Ok, besides the design decisions targeting for server machines. Out of curiosity, does a single core ARM CPU suffer from simple cache coherency problems too? Does Dalvik make use of LDREX, STREX on ARM? — user643011, Feb 14 '12 at 21:43
I don't know enough about ARM, just bits from the Dalvik GC guys. Its best to ask the concurrency-interest group. — Ben Manes, Feb 14 '12 at 23:23
I wonder if this was actually caused by the now-fixed [issue 1055](https://code.google.com/p/guava-libraries/issues/detail?id=1055&can=1&q=segment&colspec=ID%20Stars%20Type%20Status%20Package%20Summary). — Louis Wasserman, Jul 13 '12 at 16:19
@LouisWasserman did you able to find out if this issue was due to bug or its better to use LRU cache or LinkedHashMap? — Akhil Dad, Apr 23 '14 at 18:14

score 34 · Accepted Answer · answered Feb 14 '12 at 20:54

34

CacheBuilder was designed for server-side caching, where concurrency was a primary concern. It therefore trades off single-threaded and memory overhead in exchange for better multi-threaded behavior. Android developers should use LruCache, LinkedHashMap, or similar where single-threaded performance and memory are the primary concerns. In the future there may be a concurrencyLevel=0 to indicate that a lightweight, non-concurrent cache is required.

answered Feb 14 '12 at 20:54

Ben Manes

9,178
3
35
39

1

I have tried concurrencyLevel=1, but it did not improve the access time. concurrencyLevel=0 leads to an error. Indeed the AtomicReferenceArray takes a significant amount of CPU time. I'll go for android.util.LruCache for now. Thanks for your comment regarding the ARM architecture and your verbose answer. I would love to see a concurrencyLevel=0 in the future. – user643011 Feb 14 '12 at 21:29
2

concurrencyLevel=1 means single writer, multiple readers. That doesn't provide much of a simplification to work from. A level of 0 would indicate readers are synchronized with writes, so a synchronized version could be implemented. When I code reviewed LruCache we were still working on the Cache interface, so it was low priority to add concurrencyLevel=0 to MapMaker. Thus LruCache was born. – Ben Manes Feb 14 '12 at 21:44
4

FYI, LruCache is also available pre-Honeycomb in the compatibility library. http://developer.android.com/sdk/compatibility-library.html – Jesse Wilson Feb 15 '12 at 17:34

score 3 · Answer 2 · answered Feb 14 '12 at 19:43

Android code is not always optimised the same way as it is in a JVM. What can perform very well in Java might not perform as well in Android. I suggest you write a very simple cache of your own. e.g. using LinkedHashMap.removeEldestEntry() and see how that goes.

Bad performance with Guava Cache on Android

2 Answers2

Linked