How can I run a background thread that cleans up some elements in list regularly?

Question

I am currently implementing cache. I have completed basic implementation, like below. What I want to do is to run a thread that will remove entry that satisfy certain conditions.

class Cache {
    int timeLimit = 10; //how long each entry needs to be kept after accessed(marked)
    int maxEntries = 10; //maximum number of Entries
    HashSet<String> set = new HashSet<String>();   
    public void add(Entry t){
        ....
    }

    public Entry access(String key){
        //mark Entry that it has been used
        //Since it has been marked, background thread should remove this entry after timeLimit seconds.
        return set.get(key);
    }
    ....
}

My question is, how should I implement background thread so that the thread will go around the entries in set and remove the ones that has been marked && (last access time - now)>timeLimit ?

edit

Above is just simplified version of codes, that I did not write synchronized statements.

Your `Cache` does not appear to be thread safe; If you want to access it from multiple threads you better work on that first. — The Nail, Feb 19 '12 at 20:24
An alternative would be to call the cleanup from the `add` method, then you don't need an extra thread. Whether or not it is an option for you depends on the performance requirements. — The Nail, Feb 19 '12 at 20:26
or make all methods `synchronized` (currently `access` and `add`, later maybe `cleanup` as well). — The Nail, Feb 19 '12 at 20:30

Tomasz Nurkiewicz · Accepted Answer · 2012-02-19T20:49:15.277

Why are you reinventing the wheel? EhCache (and any decent cache implementation) will do this for you. Also much more lightweight ~~MapMaker~~ Cache from Guava can automatically remove old entries.

If you really want to implement this yourself, it is not really that simple.

Remember about synchronization. You should use ConcurrentHashMap or synchronized keyword to store entries. This might be really tricky.
You must store last access time somehow of each entry somehow. Every time you access an entry, you must update that timestamp.
Think about eviction policy. If there are more than maxEntries in your cache, which ones to remove first?
Do you really need a background thread?

This is surprising, but EhCache (enterprise ready and proven) does not use background thread to invalidate old entries). Instead it waits until the map is full and removes entries lazily. This looks like a good trade-off as threads are expensive.
If you have a background thread, should there be one per cache or one global? Do you start a new thread while creating a new cache or have a global list of all caches? This is harder than you think...

Once you answer all these questions, the implementation is fairly simple: go through all the entries every second or so and if the condition you've already written is met, remove the entry.

not to mention once you have a thread it will hold a strong reference that will create a leak (unless you ensure a only weak reference is held when it sleeps) — ratchet freak, Feb 19 '12 at 20:41
`MapMaker` is pretty heavily out of date; prefer the newer `Cache`. — Louis Wasserman, Feb 19 '12 at 20:44
@LouisWasserman: thanks, I haven't actually used Guava. I updated my answer. — Tomasz Nurkiewicz, Feb 19 '12 at 20:49

Irfy · Answer 2 · 2012-02-19T21:38:24.703

0

First, make access to your collection either synchronized or use ~~ConcurrentHashSet~~ a ConcurrentHashMap based Set as indicated in the comments below.

Second, write your new thread, and implement it as an endless loop that periodically iterates the prior collection and removes the elements. You should write this class in a way that it is initialized with the correct collection in the constructor, so that you do not have to worry about "how do I access the proper collection".

edited Feb 19 '12 at 21:38

answered Feb 19 '12 at 20:27

Irfy

9,323
1
45
67

That would be `Collections.newSetFromMap(new ConcurrentHashMap())` – Peter Lawrey Feb 19 '12 at 20:51

score 0 · Answer 3 · answered Feb 19 '12 at 20:32

0

I'd use Guava's Cache type for this, personally. It's already thread-safe and has methods built in for eviction from the cache based on some time limit. If you want a thread to periodically sweep it, you can just do something like this:

    new Thread(new Runnable() {
        public void run() {
            cache.cleanUp();
            try { Thread.sleep(MY_SLEEP_DURATION); } catch (Exception e) {};
        }
    }).start();

answered Feb 19 '12 at 20:32

cutchin

1,177
1
8
12

So the thread will run only once and stop? Or it should be started every time the cache is accessed causing resource shortage? 0_o – Boris Treukhov Feb 19 '12 at 20:41

score 0 · Answer 4 · answered Feb 19 '12 at 20:53

0

I don't imagine you really need a background thread. Instead you can just remove expired entries before or after you perform a lookup. This simplifies the entire implementation and its very hard to tell the difference.

BTW: If you use a LinkedHashMap, you can use it as a LRU cache by overriding removeEldestEntry (see its javadocs for an example)

answered Feb 19 '12 at 20:53

Peter Lawrey

525,659
79
751
1,130

Yeah I found out that `removeEldestEntry` very feasible. But I just wanted to try eliminating the entry that satisfy 2 conditions that has been described in the last sentence of the question. But, thanks for the info! – user482594 Feb 19 '12 at 20:59
You can add checks to by overriding the get() as well. – Peter Lawrey Feb 19 '12 at 21:44

score 0 · Answer 5 · answered Feb 19 '12 at 21:28

First of all, your presented code is incomplete because there is no get(key) on HashSet (so I assume you mean some kind of Map instead) and your code does not mention any "marking." There are also many ways to do caching, and it is difficult to pick out the best solution without knowing what you are trying to cache and why.

When implementing a cache, it is usually assumed that the data-structure will be accessed concurrently by multiple threads. So the first thing you will need to do, is to make use of a backing data-structure that is thread-safe. HashMap is not thread-safe, but ConcurrentHashMap is. There are also a number of other concurrent Map implementations out there, namely in Guava, Javolution and high-scale lib. There are other ways to build caches besides maps, and their usefulness depends on your use case. Regardless, you will most likely need to make the backing data-structure thread-safe, even if you decide you don't need the background thread and instead evict expired objects upon attempting to retrieve them from the cache. Or letting the GC remove the entries by using SoftReferences.

Once you have made the internals of your cache thread-safe, you can simply fire up a new (most likely daemonized) thread that periodically sweeps/iterates the cache and removes old entries. The thread would do this in a loop (until interrupted, if you want to be able to stop it again) and then sleep for some amount of time after each sweep.

However, you should consider whether it is worth it for you, to build your own cache implementation. Writing thread-safe code is not easy, and I recommend that you study it before endeavouring to write your own cache implementation. I can recommend the book Java Concurrency in Practice.

The easier way to go about this is, of course, to use an existing cache implementation. There are many options available in Java-land, all with their own unique set of trade-offs.

EhCache and JCS are both general purpose caches that fit most caching needs one would find in a typical "enterprise" application.
Infinispan is a cache that is optimised for distributed use, and can thus cache more data than what can fit on a single machine. I also like its ConcurrentMap based API.
As others have mentioned, Googles Guava library has a Cache API, which is quite useful for smallish in-memory caches.

Since you want to limit the number of entries in the cache, you might be interested in an object-pool instead of a cache.

Apache Commons-Pool is widely used, and has APIs that resemble what you are trying to build yourself.
Stormpot, on the other hand, has a rather different API, and I am pretty much only mentioning it because I wrote it. It's probably not what you want, but who can be sure without knowing what you are trying to cache and why?

How can I run a background thread that cleans up some elements in list regularly?

5 Answers5

Linked