Possible duplicate:
Database design: Calculating the Account Balance
I work with a web app which stores transaction data (e.g. like "amount x on date y", but more complicated) and provides calculation results based on details of all relevant transactions[1]. We are investing a lot of time into ensuring that these calculations perform efficiently, as they are an interactive part of the application: i.e. a user clicks a button and waits to see the result. We are confident, that for the current levels of data, we can optimise the database fetching and calculation to complete in an acceptable amount of time. However, I am concerned that the time taken will still grow linearly as the number of transactions grow[2]. I'd like to be able to say that we could handle an order of magnitude more transactions without excessive performance degradation.
I am looking for effective techniques, technologies, patterns or algorithms which can improve the scalability of calculations based on transaction data.
There are however, real and significant constraints for any suggestion:
- We currently have to support two highly incompatible database implementations, MySQL and Oracle. Thus, for example, using database specific stored procedures have roughly twice the maintenance cost.
- The actual transactions are more complex than the example transaction given, and the business logic involved in the calculation is complicated, and regularly changing. Thus having the calculations stored directly in SQL are not something we can easily maintain.
- Any of the transactions previously saved can be modified at any time (e.g. the date of a transaction can be moved a year forward or back) and calculations are expected to be updated instantly. This has a knock-on effect for caching strategies.
- Users can query across a large space, in several dimensions. To explain, consider being able to calculate a result as it would stand at any given date, for any particular transaction type, where transactions are filtered by several arbitrary conditions. This makes it difficult to pre-calculate the results a user would want to see.
- One instance of our application is hosted on a client's corporate network, on their hardware. Thus we can't easily throw money at the problem in terms of CPUs and memory (even if those are actually the bottleneck).
I realise this is very open ended and general, however...
Are there any suggestions for achieving a scalable solution?
[1] Where 'relevant' can be: the date queried for; the type of transaction; the type of user; formula selection; etc.
[2] Admittedly, this is an improvement over the previous performance, where an ORM's n+1 problems saw time taken grow either exponentially, or at least a much steeper gradient.