SQL Query minimizing/caching in a C++ application

Question

I'm writing a project in C++/Qt and it is able to connect to any type of SQL database supported by the QtSQL (http://doc.qt.nokia.com/latest/qtsql.html). This includes local servers and external ones.

However, when the database in question is external, the speed of the queries starts to become a problem (slow UI, ...). The reason: Every object that is stored in the database is lazy-loaded and as such will issue a query every time an attribute is needed. On average about 20 of these objects are to be displayed on screen, each of them showing about 5 attributes. This means that for every screen that I show about 100 queries get executed. The queries execute quite fast on the database server itself, but the overhead of the actual query running over the network is considerable (measured in seconds for an entire screen).

I've been thinking about a few ways to solve the issue, the most important approaches seem to be (according to me):

Make fewer queries
Make queries faster

Tackling (1)

I could find some sort of way to delay the actual fetching of the attribute (start a transaction), and then when the programmer writes endTransaction() the database tries to fetch everything in one go (with SQL UNION or a loop...). This would probably require quite a bit of modification to the way the lazy objects work but if people comment that it is a decent solution I think it could be worked out elegantly. If this solution speeds up everything enough then an elaborate caching scheme might not even be necessary, saving a lot of headaches
I could try pre-loading attribute data by fetching it all in one query for all the objects that are requested, effectively making them non-lazy. Of course in that case I will have to worry about stale data. How would I detect stale data without at least sending one query to the external db? (Note: sending a query to check for stale data for every attribute check would provide a best-case 0x performance increase and a worst-caste 2x performance decrease when the data is actually found to be stale)

Tackling (2)

Queries could for example be made faster by keeping a local synchronized copy of the database running. However I don't really have a lot of possibilities on the client machines to run for example exactly the same database type as the one on the server. So the local copy would for example be an SQLite database. This would also mean that I couldn't use an db-vendor specific solution. What are my options here? What has worked well for people in these kinds of situations?

Worries

My primary worries are:

Stale data: there are plenty of queries imaginable that change the db in such a way that it prohibits an action that would seem possible to a user with stale data.
Maintainability: How loosely can I couple in this new layer? It would obviously be preferable if it didn't have to know everything about my internal lazy object system and about every object and possible query

Final question

What would be a good way to minimize the cost of making a query? Good meaning some sort of combination of: maintainable, easy to implement, not too aplication specific. If it comes down to pick any 2, then so be it. I'd like to hear people talk about their experiences and what they did to solve it.

As you can see, I've thought of some problems and ways of handling it, but I'm at a loss for what would constitute a sensible approach. Since it will probable involve quite a lot of work and intensive changes to many layers in the program (hopefully as few as possible), I thought about asking all the experts here before making a final decision on the matter. It is also possible I'm just overlooking a very simple solution, in which case a pointer to it would be much appreciated!

Assuming all relevant server-side tuning has been done (for example: MySQL cache, best possible indexes, ...)

*Note: I've checked questions of users with similar problems that didn't entirely satisfy my question: Suggestion on a replication scheme for my use-case? and Best practice for a local database cache? for example)

If any additional information is necessary to provide an answer, please let me know and I will duly update my question. Apologies for any spelling/grammar errors, english is not my native language.

Note about "lazy"

A small example of what my code looks like (simplified of course):

QList<MyObject> myObjects = database->getObjects(20, 40); // fetch and construct object 20 to 40 from the db

// ...some time later

// screen filling time!
foreach (const MyObject& o, myObjects) {
    o->getInt("status", 0);  // == db request
    o->getString("comment", "no comment!"); // == db request
    // about 3 more of these
}

If the objects on screens are somewhat constant why not just map all attributes to screens and then do "Select all attributes needed for screen x". That way you'd have 1 query with "big" result set. After that the only thing you can improve is the query it self. — Maiku Mori, Aug 01 '11 at 14:34
I'm not sure if I understand your use of lazy loading.. Do you need to load all 20 objects (with 5 attributes each) at the same time? — Tim Meyer, Aug 01 '11 at 14:48
@Maiku Yes this is one of the things I've thought about, I envisioned something similar in the second bullet of "tackling (1)", it might be one of the "simpler" solutions but would still be quite ugly to implement. Currently this is one of my favourite ways but I'm waiting for more suggestions. — Aktau, Aug 01 '11 at 17:10
@TimMeyer Yes, the main view of the application loads them at the same time. What happens is: the main view requests 20 objects, the database fetches the core attributes from the DB in one query and the application constructs 20 objects, handing them off to the main view. However, the main view will depending on certain settings, ask for a lot more than the core attributes, each of these non-core attributes (a sort of metadata) has to be requested seperately from the db. That's where the slowness comes from. Note that the main view is not the only possible view, sometimes ALL objects are neede — Aktau, Aug 01 '11 at 17:12
If you need to load those objects at the same time, you should do one query per type, e.g. one for the objects, one for the attributes (or maybe more than one if really required). If the 100 queries would get executed at the same time your new two (or more) queries are executed anyway, there should be no difference in terms of stale data, but a good difference in terms of execution time. Implementing this might be hard if you have generic lazy loading algorithms, but should be worth the effort. — Tim Meyer, Aug 02 '11 at 07:42
@TimMeyer Yes that's exactly it :). This has been my main path of thought for some time, and you've also pinpointed the exact problem. I do have generic lazy loading algorithms (i.e.: every attribute is treated in the same way, thank god for QVariant and templates). There would probably be many changes necessary to switch over to an optional non-lazy-preload. And I would also need to be able to switch over to lazy-loading when necessary. On top of that I'm still not sure how to do the validity checks or rather make them performant, it seems logical that I can't avoid at least one DB hit... — Aktau, Aug 02 '11 at 08:33
I don't see how this would be non-lazy. I understood your program like this: If you start your application, data won't be loaded. If you open any view, the data for this view has to be lazy loaded. Right now, this is performed by a generic (maybe recursive?) algorithm which probably tells each object to load its data (or similar). My intention was to alter the way (not the time) data is loaded: First collect information on which objects need to be loaded. Use the information about all objects to load in order to make one query per object type, then fill all the objects. — Tim Meyer, Aug 02 '11 at 11:49
@TimMeyer: I'm going to expand my question and show what I mean by lazy because that's easier to read. I'm pretty sure I understand what you're saying and I agree with you that it's one of two big ways in which the speed could be improved and possibly the easiest. The thing is, there is no "filling" of objects. All objects request the data from the database whenever the attribute is needed. Yes, even if the same attribute is needed 2000 times in a row, it will be fetched from the db 2000 times. At least that's the current design. It's advantages: simple and always up-to-date. — Aktau, Aug 02 '11 at 17:13

score 2 · Accepted Answer · answered Aug 01 '11 at 14:19

At first glance it looks like you have two conflicting goals: Query speed, but always using up-to-date data. Thus you should probably fall back to your needs to help decide here.

1) Your database is nearly static compared to use of the application. In this case use your option 1b and preload all the data. If there's a slim chance that the data may change underneath, just give the user an option to refresh the cache (fully or for a particular subset of data). This way the slow access is in the hands of the user.

2) The database is changing fairly frequently. In this case "perhaps" an SQL database isn't right for your needs. You may need a higher performance dynamic database that pushes updates rather than requiring a pull. That way your application would get notified when underlying data changed and you would be able to respond quickly. If that doesn't work however, you want to concoct your query to minimize the number of DB library and I/O calls. For example if you execute a sequence of select statements your results should have all the appropriate data in the order you requested it. You just have to keep track of what the corresponding select statements were. Alternately if you can use a looser query criteria so that it returns more than one row for your simple query that ought to help performance as well.

You're right about me having somewhat conflicting requirements (or so it would seem). I've been thinking about what you asked and this is a general observation of the usage: The users are generally NOT editing the same objects, and probably not even on the same screen. So that would mean I'm dealing with a static database type? Also, it still wouldn't be good that if it does happen (a conflict), several illegal actions can be performed — Aktau, Aug 01 '11 at 14:34