0

I want to implement a lightweight caching data structure in Java, in code. I was primarily thinking about implementing something from scratch with HashMap, but thought there might be available solutions.

The cache will be between the application and the database layer, and I will probably update it every 5 minutes. The data are all in JSON format, containing series of events with timestamp. The expiration rule is that I want the cache to only include events whose timestamp are within say [-10min, +10min] from current time of insertions, not the time of insertion (I might insert multiple data at the same time to cache, however their timestamp might be different). So basically everytime I grabbed some data from database, I will need to do some JSON processing and insert/update the cache. It can be up to a few thousand JSON files so my scale is not very large.

What would be the best way to implement this cache? What data structure would you recommend to use?

UPDATE: Few things to note:

  • Data Structure: The data events I grabbed from database are sorted in time (hundreds of timed event streams per each call). So my cache would be actually a time window of events. It keeps events of [-10min, +10min] of current time at insertion (every 5 minutes).

  • Insertion: As the data events I grabbed from database are sorted in time. So I'm not sure if I should erase the whole cache and insert my new event streams, or pass the events to the cache and it will take care of updating itself given the timestamps.

  • Retrieval: To do a retrieval, now I need to retrieve the series of events falling into a [start_time, end_time] interval which is a subset of my cache data (so these values are less than 10 minutes). So consider that when suggesting the right data structure.

Tina J
  • 4,983
  • 13
  • 59
  • 125
  • 1
    You can take a look on https://stackoverflow.com/questions/3802370/java-time-based-map-cache-with-expiring-keys – Amit Bera Jan 17 '19 at 04:12
  • 1
    try google guava cache – TongChen Jan 17 '19 at 04:17
  • 1
    [Caffeine](https://github.com/ben-manes/caffeine) supports a custom [expiration policy](https://github.com/ben-manes/caffeine/wiki/Eviction#time-based) which is amortized O(1) ([hierarchical timer wheel](https://www.confluent.io/blog/apache-kafka-purgatory-hierarchical-timing-wheels/)). – Ben Manes Jan 17 '19 at 06:45
  • Thanks. Will look at these. Please also look at my update in the question. – Tina J Jan 17 '19 at 09:58
  • 1
    If you are reloading every 5m, you could just use a volatile `Map` field and replace it in one shot. The map itself would be unmodifiable, as you swap it rather than update. If that's all you need then a caching library is probably overly complex. – Ben Manes Jan 17 '19 at 17:18
  • @BenManes Right, I don't think we need something complex. A HashMap is totally fine. But idk how to do the data only with a subset time window. Should I get all items and just iterate over them and choose the ones falling within the window?! – Tina J Jan 17 '19 at 17:25
  • Yes, but if you only need those from the database you should write the SQL to only load that subset. – Ben Manes Jan 17 '19 at 17:27
  • Well, the notion is to somehow cache a larger portion locally. – Tina J Jan 17 '19 at 18:49
  • writing your self is okay as a learning exercise but for a serious app always use battle tester library code like guava cache or mapdb cache that let's you offload to local disk. – simbo1905 Jan 17 '19 at 21:58
  • 1
    @TinaJ Yes, I meant your db retrieval query to populate the cache could return only the subset in the time range (e.g. `between` operator). Regardless, you could store into a `NavigableMap>` for efficient range lookups. – Ben Manes Jan 18 '19 at 00:43

0 Answers0