2

I have an idea I have yet to implement, because I have some fear I may be barking up the wrong tree... mainly because Googling on the topic returns so few results.

Basically I have some SQL queries that are slow, in large part because they have subqueries that are time-consuming. For example, they might do things like "give me a count of all bicycles that are red and ridden by boys between the ages of 10-15". This is expensive as it sloshes through all of the bicycles, but the end result is a single number. And, in my case, I don't really need that number to be 100% up to date.

The ultimate solution for problems of this sort seems to be to apply an OLAP-based engine to pre-cache these permutations. However, in my case I'm not really trying to slice and dice the data around a ton of metrics, and I'd love not to have to complicate my architecture with yet another process/datastore running.

So... my idea was basically memoizing these subqueries in the database. I might have a table called "BicycleStatistics" and it might store the output of that subquery above as a name value pair of it's inputs and outputs.

Ex name: "c_red_g_male_a_10-15" value: 235

And have a mechanism that memoizes those values to that table as the queries are run.

Has anyone been in this situation and tried anything similar? The reason I think a solution like this is valuable over the "throw a lot of RAM in your DB and let the database handle it" is (A) my database is bigger than the amount of RAM I can conveniently throw at it, and (B) the database is going to ensure I get the exact right number for these statistics, and my big win, above, is that I'm ok with the numbers being a day or two out of date.

Thanks for any thoughts/feedback.

Tom

Tom Lianza
  • 4,012
  • 4
  • 41
  • 50
  • 1
    Have you looked at the query plans to see why your queries are so slow? It could be that you are using an inefficient method, for example a correlated subquery instead of a JOIN, or you're missing an index somewhere. – Mark Byers Feb 17 '10 at 00:58
  • 1
    "The ultimate solution for problems of this sort" - is to first check you have the right indexes for your query workload – Mitch Wheat Feb 17 '10 at 01:13
  • Indeed, I've spent a ton of time making sure the queries are fully using indexes. The example I'm using above is a bit contrived, but in my case it's as if I have a few million bicycles... and I'm working on getting it to scale so that it can be run several times a second. – Tom Lianza Feb 17 '10 at 08:11

1 Answers1

2

Materialized views are a way of achieving this requirement, if your DBMS supports them.

Tony Andrews
  • 129,880
  • 21
  • 220
  • 259