First of all, your presented code is incomplete because there is no get(key)
on HashSet
(so I assume you mean some kind of Map
instead) and your code does not mention any "marking." There are also many ways to do caching, and it is difficult to pick out the best solution without knowing what you are trying to cache and why.
When implementing a cache, it is usually assumed that the data-structure will be accessed concurrently by multiple threads. So the first thing you will need to do, is to make use of a backing data-structure that is thread-safe. HashMap
is not thread-safe, but ConcurrentHashMap
is. There are also a number of other concurrent Map
implementations out there, namely in Guava, Javolution and high-scale lib. There are other ways to build caches besides maps, and their usefulness depends on your use case. Regardless, you will most likely need to make the backing data-structure thread-safe, even if you decide you don't need the background thread and instead evict expired objects upon attempting to retrieve them from the cache. Or letting the GC remove the entries by using SoftReference
s.
Once you have made the internals of your cache thread-safe, you can simply fire up a new (most likely daemonized) thread that periodically sweeps/iterates the cache and removes old entries. The thread would do this in a loop (until interrupted, if you want to be able to stop it again) and then sleep for some amount of time after each sweep.
However, you should consider whether it is worth it for you, to build your own cache implementation. Writing thread-safe code is not easy, and I recommend that you study it before endeavouring to write your own cache implementation. I can recommend the book Java Concurrency in Practice.
The easier way to go about this is, of course, to use an existing cache implementation. There are many options available in Java-land, all with their own unique set of trade-offs.
- EhCache and JCS are both general purpose caches that fit most caching needs one would find in a typical "enterprise" application.
- Infinispan is a cache that is optimised for distributed use, and can thus cache more data than what can fit on a single machine. I also like its
ConcurrentMap
based API.
- As others have mentioned, Googles Guava library has a Cache API, which is quite useful for smallish in-memory caches.
Since you want to limit the number of entries in the cache, you might be interested in an object-pool instead of a cache.
- Apache Commons-Pool is widely used, and has APIs that resemble what you are trying to build yourself.
- Stormpot, on the other hand, has a rather different API, and I am pretty much only mentioning it because I wrote it. It's probably not what you want, but who can be sure without knowing what you are trying to cache and why?