21

EDIT: I've reorganized this question to reflect the new information that since became available.

This question is based on the responses to a question by Viliam concerning Guava Maps' use of lazy eviction: Laziness of eviction in Guava's maps

Please read this question and its response first, but essentially the conclusion is that Guava maps do not asynchronously calculate and enforce eviction. Given the following map:

ConcurrentMap<String, MyObject> cache = new MapMaker()
        .expireAfterAccess(10, TimeUnit.MINUTES)
        .makeMap();

Once ten minutes has passed following access to an entry, it will still not be evicted until the map is "touched" again. Known ways to do this include the usual accessors - get() and put() and containsKey().

The first part of my question [solved]: what other calls cause the map to be "touched"? Specifically, does anyone know if size() falls into this category?

The reason for wondering this is that I've implemented a scheduled task to occasionally nudge the Guava map I'm using for caching, using this simple method:

public static void nudgeEviction() {
    cache.containsKey("");
}

However I'm also using cache.size() to programmatically report the number of objects contained in the map, as a way to confirm this strategy is working. But I haven't been able to see a difference from these reports, and now I'm wondering if size() also causes eviction to take place.

Answer: So Mark has pointed out that in release 9, eviction is invoked only by the get(), put(), and replace() methods, which would explain why I wasn't seeing an effect for containsKey(). This will apparently change with the next version of guava which is set for release soon, but unfortunately my project's release is set sooner.

This puts me in an interesting predicament. Normally I could still touch the map by calling get(""), but I'm actually using a computing map:

ConcurrentMap<String, MyObject> cache = new MapMaker()
        .expireAfterAccess(10, TimeUnit.MINUTES)
        .makeComputingMap(loadFunction);

where loadFunction loads the MyObject corresponding to the key from a database. It's starting to look like I have no easy way of forcing eviction until r10. But even being able to reliably force eviction is put into doubt by the second part of my question:

The second part of my question [solved]: In reaction to one of the responses to the linked question, does touching the map reliably evict all expired entries? In the linked answer, Niraj Tolia indicates otherwise, saying eviction is potentially only processed in batches, which would mean multiple calls to touch the map might be needed to ensure all expired objects were evicted. He did not elaborate, however this seems related to the map being split into segments based on concurrency level. Assuming I used r10, in which a containsKey("") does invoke eviction, would this then be for the entire map, or only for one of the segments?

Answer: maaartinus has addressed this part of the question:

Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.

So it looks like calling containsKey("") wouldn't be a viable fix, even in r10. This reduces my question to the title: How can I reliably force eviction to occur?

Note: Part of the reason my web app is noticeably affected by this issue is that when I implemented caching I decided to use multiple maps - one for each class of my data objects. So with this issue there is the possibility that one area of code is executed, causing a bunch of Foo objects to be cached, and then the Foo cache isn't touched again for a long time so it doesn't evict anything. Meanwhile Bar and Baz objects are being cached from other areas of code, and memory is being eaten. I'm setting a maximum size on these maps, but this is a flimsy safeguard at best (I'm assuming its effect is immediate - still need to confirm this).

UPDATE 1: Thanks to Darren for linking the relevant issues - they now have my votes. So it looks like a resolution is in the pipeline, but seems unlikely to be in r10. In the meantime, my question remains.

UPDATE 2: At this point I'm just waiting for a Guava team member to give feedback on the hack maaartinus and I put together (see answers below).

LAST UPDATE: feedback received!

Community
  • 1
  • 1
Paul Bellora
  • 54,340
  • 18
  • 130
  • 181
  • 1
    Issues http://code.google.com/p/guava-libraries/issues/detail?id=681 and http://code.google.com/p/guava-libraries/issues/detail?id=608 are related to this. – Darren Gilroy Aug 16 '11 at 08:23

7 Answers7

8

I just added the method Cache.cleanUp() to Guava. Once you migrate from MapMaker to CacheBuilder you can use that to force eviction.

fry
  • 581
  • 2
  • 5
  • 1
    what's the difference between `cleanUp()` and `invalidateAll()`? Both seem to be removing/evicting the entries in the cache present at that time. – asgs Sep 09 '15 at 09:53
  • `cleanUp()` should only get rid of *invalidated* entries, as per your cache configuration. If there are still valid entries present (i.e. haven't expired), it won't touch them. Conversely, `invalidateAll()` will literally invalidate all entries, even those that wouldn't otherwise be considered expired. – Jeff Evans Oct 29 '19 at 22:44
6

I was wondering the about the same issue you described in the first part of your question. From what I can tell from looking at the source code for Guava's CustomConcurrentHashMap (release 9), it appears that entries are evicted on the get(), put(), and replace() methods. The containsKey() method does not appear to invoke eviction. I'm not 100% sure because I took a quick pass at the code.

Update:

I also found a more recent version of the CustomConcurrentHashmap in Guava's git repository and it looks like containsKey() has been updated to invoke eviction.

Both release 9 and the latest version I just found do not invoke eviction when size() is called.

Update 2:

I recently noticed that Guava r10 (yet to be released) has a new class called CacheBuilder. Basically this class is a forked version of the MapMaker but with caching in mind. The documentation suggests that it will support some of the eviction requirements you are looking for.

I reviewed the updated code in r10's version of the CustomConcurrentHashMap and found what looks like a scheduled map cleaner. Unfortunately, that code appears unfinished at this point but r10 looks more and more promising each day.

Mark Bouchard
  • 851
  • 1
  • 9
  • 20
  • +1 nice job going to the source - I should've done the same. Interesting that `containsKey()` doesn't invoke eviction in release 9. I'm also using that release, which would explain why I'm not seeing an effect as there is none! – Paul Bellora Aug 15 '11 at 19:44
  • 1
    @Kublai Khan: I updated the links in my answer. Google recently switched from svn to git for Guava. – Mark Bouchard Aug 16 '11 at 20:00
  • thanks for the updates. I think `CacheBuilder` shows us the Guava team cares about these use cases and I'm excited to see them branching to cover them. – Paul Bellora Aug 23 '11 at 01:51
5

Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.

The easiest way to enforce eviction seems to be to put some dummy object into each segment. For this to work, you'd need to analyze CustomConcurrentHashMap.hash(Object), which is surely no good idea, as this method may change anytime. Moreover, depending on the key class it may be hard to find a key with a hashCode ensuring it lands in a given segment.

You could use reads instead, but would have to repeat them 64 times per segment. Here, it'd easy to find a key with an appropriate hashCode, since here any object is allowed as an argument.

Maybe you could hack into the CustomConcurrentHashMap source code instead, it could be as trivial as

public void runCleanup() {
    final Segment<K, V>[] segments = this.segments;
    for (int i = 0; i < segments.length; ++i) {
        segments[i].runCleanup();
    }
}

but I wouldn't do it without a lot of testing and/or an OK by a guava team member.

maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • 1
    +1 thanks for the insight and for pointing out this hack. I've coded it into a reflection-based hack, posted as an answer below. The reason for this is that modifying the Guava source was inconvenient for my particular project. Anyway I'm hoping that some Guava people can weigh in on this soon. I'll award the bounty once I can confirm this as viable. – Paul Bellora Aug 22 '11 at 15:53
  • Thanks for the bounty, I'm glad it helped you to a working solution. – maaartinus Aug 24 '11 at 20:21
  • @maaartinus I have a similar question [here](http://stackoverflow.com/questions/41113560/how-to-reliably-drop-records-from-guava-loadingcache) on the Guava Loading Cache and I also need to do exact same thing. Confusion I have is, if I call `cleanUp()` every 1 minute from a background thread (using ScheduledExecutorService) then my RemovalListener will get called? I don't see that being called. Or may be I am mixing lot of things here.. I just want to load all the metrics in my cache and periodically drop those metrics every 1 minute by sending to some other system. – john Dec 13 '16 at 12:38
3

Yep, we've gone back and forth a few times on whether these cleanup tasks should be done on a background thread (or pool), or should be done on user threads. If they were done on a background thread, this would eventually happen automatically; as it is, it'll only happen as each segment gets used. We're still trying to come up with the right approach here - I wouldn't be surprised to see this change in some future release, but I also can't promise anything or even make a credible guess as to how it will change. Still, you've presented a reasonable use case for some kind of background or user-triggered cleanup.

Your hack is reasonable, as long as you keep in mind that it's a hack, and liable to break (possibly in subtle ways) in future releases. As you can see in the source, Segment.runCleanup() calls runLockedCleanup and runUnlockedCleanup: runLockedCleanup() will have no effect if it can't lock the segment, but if it can't lock the segment it's because some other thread has the segment locked, and that other thread can be expected to call runLockedCleanup as part of its operation.

Also, in r10, there's CacheBuilder/Cache, analogous to MapMaker/Map. Cache is the preferred approach for many current users of makeComputingMap. It uses a separate CustomConcurrentHashMap, in the common.cache package; depending on your needs, you may want your GuavaEvictionHacker to work with both. (The mechanism is the same, but they're different Classes and therefore different Methods.)

schmoe
  • 31
  • 1
  • +1 Thanks, this is the confirmation I was hoping for (albeit anonymous). Good point about the CacheBuilder should we move to r10. I enjoy working with Guava and look forward to future developments. – Paul Bellora Aug 24 '11 at 19:15
2

I'm not a big fan of hacking into or forking external code until absolutely necessary. This problem occurs in part due to an early decision for MapMaker to fork ConcurrentHashMap, thereby dragging in a lot of complexity that could have been deferred until after the algorithms were worked out. By patching above MapMaker, the code is robust to library changes so that you can remove your workaround on your own schedule.

An easy approach is to use a priority queue of weak reference tasks and a dedicated thread. This has the drawback of creating many stale no-op tasks, which can become excessive in due to the O(lg n) insertion penalty. It works reasonably well for small, less frequently used caches. It was the original approach taken by MapMaker and its simple to write your own decorator.

A more robust choice is to mirror the lock amortization model with a single expiration queue. The head of the queue can be volatile so that a read can always peek to determine if it has expired. This allows all reads to trigger an expiration and an optional clean-up thread to check regularly.

By far the simplest is to use #concurrencyLevel(1) to force MapMaker to use a single segment. This reduces the write concurrency, but most caches are read heavy so the loss is minimal. The original hack to nudge the map with a dummy key would then work fine. This would be my preferred approach, but the other two options are okay if you have high write loads.

Ben Manes
  • 9,178
  • 3
  • 35
  • 39
  • +1 Thanks for the insightful suggestions. Your middle two paragraphs are worth considering more long term (time is short for me at the moment). I would need to see code examples of the third paragraph to understand it fully, or else need to devote more time to understanding the lock amortization model. Thanks for giving this attention. – Paul Bellora Aug 24 '11 at 19:56
  • See this 1000 ft [design document](http://code.google.com/p/concurrentlinkedhashmap/wiki/Design) or the associated code, which is the basis for MapMaker's algorithms. Charles and I will have a presentation with more details available mid September. – Ben Manes Aug 24 '11 at 20:23
  • Thanks, I'll be sure to check out this document. I knew nothing of Guava's inner workings before exploring this issue, which taught me a lot about what goes on behind the scenes. Looking forward to learning more, and to the presentation you mention. – Paul Bellora Aug 24 '11 at 20:49
1

I don't know if it is appropriate for your use case, but your main concern about the lack of background cache eviction seems to be memory consumption, so I would have thought that using softValues() on the MapMaker to allow the Garbage Collector to reclaim entries from the cache when a low memory situation occurs. Could easily be the solution for you. I have used this on a subscription-server (ATOM) where entries are served through a Guava cache using SoftReferences for values.

Morten
  • 11
  • 2
  • Thanks, this was worth bringing up. Actually, I had been using `softValues()` originally and moved to time-based eviction after deciding they were too clumsy and unpredictable, at least with our JVM. See http://stackoverflow.com/questions/6592183/what-are-the-practical-consequences-of-using-soft-references and http://stackoverflow.com/questions/6778743/my-ideal-cache-using-guava for my thought process. – Paul Bellora Aug 25 '11 at 14:59
0

Based on maaartinus's answer, I came up with the following code which uses reflection rather than directly modifying the source (If you find this useful please upvote his answer!). While it will come at a performance penalty for using reflection, the difference should be negligible since I'll run it about once every 20 minutes for each caching Map (I'm also caching the dynamic lookups in the static block which will help). I have done some initial testing and it appears to work as intended:

public class GuavaEvictionHacker {

   //Class objects necessary for reflection on Guava classes - see Guava docs for info
   private static final Class<?> computingMapAdapterClass;
   private static final Class<?> nullConcurrentMapClass;
   private static final Class<?> nullComputingConcurrentMapClass;
   private static final Class<?> customConcurrentHashMapClass;
   private static final Class<?> computingConcurrentHashMapClass;
   private static final Class<?> segmentClass;

   //MapMaker$ComputingMapAdapter#cache points to the wrapped CustomConcurrentHashMap
   private static final Field cacheField;

   //CustomConcurrentHashMap#segments points to the array of Segments (map partitions)
   private static final Field segmentsField;

   //CustomConcurrentHashMap$Segment#runCleanup() enforces eviction on the calling Segment
   private static final Method runCleanupMethod;

   static {
      try {

         //look up Classes
         computingMapAdapterClass = Class.forName("com.google.common.collect.MapMaker$ComputingMapAdapter");
         nullConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullConcurrentMap");
         nullComputingConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullComputingConcurrentMap");
         customConcurrentHashMapClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap");
         computingConcurrentHashMapClass = Class.forName("com.google.common.collect.ComputingConcurrentHashMap");
         segmentClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap$Segment");

         //look up Fields and set accessible
         cacheField = computingMapAdapterClass.getDeclaredField("cache");
         segmentsField = customConcurrentHashMapClass.getDeclaredField("segments");
         cacheField.setAccessible(true);
         segmentsField.setAccessible(true);

         //look up the cleanup Method and set accessible
         runCleanupMethod = segmentClass.getDeclaredMethod("runCleanup");
         runCleanupMethod.setAccessible(true);
      }
      catch (ClassNotFoundException cnfe) {
         throw new RuntimeException("ClassNotFoundException thrown in GuavaEvictionHacker static initialization block.", cnfe);
      }
      catch (NoSuchFieldException nsfe) {
         throw new RuntimeException("NoSuchFieldException thrown in GuavaEvictionHacker static initialization block.", nsfe);
      }
      catch (NoSuchMethodException nsme) {
         throw new RuntimeException("NoSuchMethodException thrown in GuavaEvictionHacker static initialization block.", nsme);
      }
   }

   /**
    * Forces eviction to take place on the provided Guava Map. The Map must be an instance
    * of either {@code CustomConcurrentHashMap} or {@code MapMaker$ComputingMapAdapter}.
    * 
    * @param guavaMap the Guava Map to force eviction on.
    */
   public static void forceEvictionOnGuavaMap(ConcurrentMap<?, ?> guavaMap) {

      try {

         //we need to get the CustomConcurrentHashMap instance
         Object customConcurrentHashMap;

         //get the type of what was passed in
         Class<?> guavaMapClass = guavaMap.getClass();

         //if it's a CustomConcurrentHashMap we have what we need
         if (guavaMapClass == customConcurrentHashMapClass) {
            customConcurrentHashMap = guavaMap;
         }
         //if it's a NullConcurrentMap (auto-evictor), return early
         else if (guavaMapClass == nullConcurrentMapClass) {
            return;
         }
         //if it's a computing map we need to pull the instance from the adapter's "cache" field
         else if (guavaMapClass == computingMapAdapterClass) {
            customConcurrentHashMap = cacheField.get(guavaMap);
            //get the type of what we pulled out
            Class<?> innerCacheClass = customConcurrentHashMap.getClass();
            //if it's a NullComputingConcurrentMap (auto-evictor), return early
            if (innerCacheClass == nullComputingConcurrentMapClass) {
               return;
            }
            //otherwise make sure it's a ComputingConcurrentHashMap - error if it isn't
            else if (innerCacheClass != computingConcurrentHashMapClass) {
               throw new IllegalArgumentException("Provided ComputingMapAdapter's inner cache was an unexpected type: " + innerCacheClass);
            }
         }
         //error for anything else passed in
         else {
            throw new IllegalArgumentException("Provided ConcurrentMap was not an expected Guava Map: " + guavaMapClass);
         }

         //pull the array of Segments out of the CustomConcurrentHashMap instance
         Object[] segments = (Object[])segmentsField.get(customConcurrentHashMap);

         //loop over them and invoke the cleanup method on each one
         for (Object segment : segments) {
            runCleanupMethod.invoke(segment);
         }
      }
      catch (IllegalAccessException iae) {
         throw new RuntimeException(iae);
      }
      catch (InvocationTargetException ite) {
         throw new RuntimeException(ite.getCause());
      }
   }
}

I'm looking for feedback on whether this approach is advisable as a stopgap until the issue is resolved in a Guava release, particularly from members of the Guava team when they get a minute.

EDIT: updated the solution to allow for auto-evicting maps (NullConcurrentMap or NullComputingConcurrentMap residing in a ComputingMapAdapter). This turned out to be necessary in my case, since I'm calling this method on all of my maps and a few of them are auto-evictors.

Paul Bellora
  • 54,340
  • 18
  • 130
  • 181
  • IMHO, the performance penalty is not worth mentioning as the cleanup is quite a non-trivial operation. That said, I'd prefer hacking the source code to reflection, since it's much shorter and thus less prone to errors. When guava changes, both may break, anyway. Sure, you may have some good reasons for preferring reflection. – maaartinus Aug 22 '11 at 15:59
  • @maaartinus - actually, just getting to the `CustomConcurrentHashMap` turned out to be most the battle - something like this may have been necessary anyway, or else many small interrelated hacks made to multiple guava classes. Still owe you for coming up with the essential code though. – Paul Bellora Aug 23 '11 at 01:47