1

I'm on a project asking for high performances... And I was told to use as a few database calls as possible, and to use more objects in the JVM memory. Right.

So... It didn't shock me at first, but now I'm questioning the approach.

How can I know which is best ?

On the one hand I would have :

 - static Map <id1, id2>
 - static Map <id2, ObjectX>

Object X
 - id2
 - map <id1, ObjectY>

Object Y
 - id1

So basically, this data structure would help me to get an ObjectY from an id1. And I would be able to send back the whole ObjectX as well when needed.

You gotta know that the structure is filled by a service call (A). Then, updates to objects ObjectY can happen through another service (B). Finally, another service can send back an ObjectX (C). Which makes three services using the data.

On the other hand, I could have :

 - db table for ObjectY T1
 - db join table associating id1s and id2s T2
 - db table for Object X T3

Service A would make an insert in the tables. Service B would make an update in table T1 Service C would make a join between T2 and T1 to get all ObjectY objects for an ObjectX

In my opinion, the db version is more flexible... I am unsure about the performances, but I would say the db version shouldn't be slower than the "memory" version. Finally, hasn't the "memory" version got some risks ?

I hope it seems obvious to some of you I should choose one version and why... I'm hoping this not to be a debate. I'm looking for ways to know what's quicker...

Adrien Gorrell
  • 1,291
  • 3
  • 14
  • 28

2 Answers2

2

What you are doing is building a cache. And it's a hugely popular and proven technique, with many implementations ranging from simple Map usage to full vendor products, support for caching across servers, and all sorts of bells and whistles.

And, done well, you should indeed get all sorts of performance improvements. But the main challenge in caching: how do you know when your cache entry is "stale", i.e. the DB has content that has changed, but your cache doesn't know about it?

You might have an obvious answer here. You might be caching stuff that actually won't change. Cache invalidation is the proper term here - when to refresh it because you know it's stale and you need fresh content.

I think all the trade offs that you rightly recognise are ones you personally need to weigh up, with the extra confidence that you're not "missing something".

One final thought - will you have enough memory to cache everything? Maybe you need to limit it, e.g. to the top 100,000 objects that get requested. Looking at 3rd party caching tools like EHCache, or Guava could be useful:

https://code.google.com/p/guava-libraries/wiki/CachesExplained

Brian
  • 6,391
  • 3
  • 33
  • 49
2

Retrieving an object stored in memory will take on the order of hundreds of nanoseconds (less if it has been accessed recently and so it in a cache). Of course this latency will vary based on your platform, but this is a ballpark figure for comparison. Retrieving the same information from a database - again it depends on many factors such as whether the database is on the same machine - but it will take on the order of milliseconds at least i.e. tens of thousands of times slower.

Which is quicker - you will need to be more specific, which operations will you be measuring for speed? But the in-memory version will be faster in pretty much all cases. The database version gives different advantages - persistence, access from different machines, transactional commit / rollback - but speed is not one of them, not compared with an in-memory calculation.

Yes, the in-memory version has risks - basically if the machine is powered down (or the process exits for whatever reason...memory corruption, uncaught exception) then the data will be lost (i.e. in-memory solution does not have 'persistence' unlike a database).

Community
  • 1
  • 1
Graham Griffiths
  • 2,196
  • 1
  • 12
  • 15