4

The project I am working on is facing a design dilemma on how to get objects and collections of objects from a database. Sometimes it is useful to buffer *all* objects from the database with its properties into memory, sometimes it is useful to just set an object id and query its properties on-demand (1 db call per object to get all properties). And in many cases, collections need to support both buffering objects into memory and being initialized with minimum information for on-demand access. After all, not everything can be buffered into memory and not everything can be read on-demand. It is a ubiquitous memory vs IO problem.

Did anyone have to face the same problem? How did affect your design? What are the tough lessons learned? Any other thoughts and recommendations?

EDIT: my project is a classic example of a business layer dll, consumed by a web application, web services and desktop application. When a list of products is requested for a desktop application and displayed only by product name, it is ok to have this sequence of steps to display all products (lets say there is a million of products in the database):
1. One db call to get all product names
2. One db call to get all product information if the user clicks on the product to see details (on-demand access)

However, if this same API is going to be consumed by a web service to display all products with details, the network traffic will become chatty. The better sequence in this case would be:
1. What the heck, buffer all products and product fields from just one db call (in this case buffering 1 million products also looks scary)

kateroh
  • 4,382
  • 6
  • 43
  • 62

2 Answers2

6

It depends how often the data changes. It is common to cache static and near static data (usually with a cache expiry window).

Databases are already designed to cache data, so provided network I/O is not a bottleneck, let the database do what it is good at.

Have you looked at some of the caching technologies available?

Community
  • 1
  • 1
Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
  • Thanks for the links! Velocity article was quite enjoyable to read... Static data is already cached in our in-house cache. Other data that is not static is not cached at all and not even completely read into memory (fields are accessed on-demand, see the edit). In this case of dynamic data, it's hard to decide whether it should be read into memory completely or partially. – kateroh Feb 08 '11 at 08:16
2

This is not a popular position, but avoid all caching unless absolutely necessary or if you know for sure immediately that you're going to need to "Internet-scale." Tried to scale out a layered-cache atop the database? Are you going to write-through the cache and only read or wait for an LRU object to write changes? what happens when another app or web services tier sit atop the DB and get inconsistent reads?

Most modern databases already have cache and likely can implement them better than you, just determine if you want to hit the DB wire every time you need something. In a large majority of the cases, the DB'll perform just fine and you'll keep your consistency. BASE and CAP theory is nice and fun to talk about and imagine, but you sometimes just can't beat the cost-to-market of just hitting the good old database. Stress test and determine your hotspots, implement your cache conservatively if needed.

Jé Queue
  • 10,359
  • 13
  • 53
  • 61
  • thanks for the advice. the original problem is not just a matter of a cache vs database hits (the project does have a cache already for less frequently changing objects). it is also a question of whether get-all methods should read all data into memory instead of just getting minimum data and returning the rest on demand (also less chances of the data being stale) – kateroh Feb 08 '11 at 08:20
  • @kateroh - Again, you're getting back to the same cache issue, which is that all can be well if you ALWAYS go through cache. If there are other layers that make durable data without invalidating the cache, you will end up with inconsistent results. The web being (mostly) stateless is often a mismatch with transactional systems like this, but I've been BURNED too many times with cache inconsistency and coherence overhead. – Jé Queue Feb 08 '11 at 12:43
  • @kateroh - Personally, KISS it and query your DB when needed. – Jé Queue Feb 08 '11 at 12:44