3

Possible duplicate:
Database design: Calculating the Account Balance

I work with a web app which stores transaction data (e.g. like "amount x on date y", but more complicated) and provides calculation results based on details of all relevant transactions[1]. We are investing a lot of time into ensuring that these calculations perform efficiently, as they are an interactive part of the application: i.e. a user clicks a button and waits to see the result. We are confident, that for the current levels of data, we can optimise the database fetching and calculation to complete in an acceptable amount of time. However, I am concerned that the time taken will still grow linearly as the number of transactions grow[2]. I'd like to be able to say that we could handle an order of magnitude more transactions without excessive performance degradation.

I am looking for effective techniques, technologies, patterns or algorithms which can improve the scalability of calculations based on transaction data.

There are however, real and significant constraints for any suggestion:

  • We currently have to support two highly incompatible database implementations, MySQL and Oracle. Thus, for example, using database specific stored procedures have roughly twice the maintenance cost.
  • The actual transactions are more complex than the example transaction given, and the business logic involved in the calculation is complicated, and regularly changing. Thus having the calculations stored directly in SQL are not something we can easily maintain.
  • Any of the transactions previously saved can be modified at any time (e.g. the date of a transaction can be moved a year forward or back) and calculations are expected to be updated instantly. This has a knock-on effect for caching strategies.
  • Users can query across a large space, in several dimensions. To explain, consider being able to calculate a result as it would stand at any given date, for any particular transaction type, where transactions are filtered by several arbitrary conditions. This makes it difficult to pre-calculate the results a user would want to see.
  • One instance of our application is hosted on a client's corporate network, on their hardware. Thus we can't easily throw money at the problem in terms of CPUs and memory (even if those are actually the bottleneck).

I realise this is very open ended and general, however...

Are there any suggestions for achieving a scalable solution?

[1] Where 'relevant' can be: the date queried for; the type of transaction; the type of user; formula selection; etc.
[2] Admittedly, this is an improvement over the previous performance, where an ORM's n+1 problems saw time taken grow either exponentially, or at least a much steeper gradient.

Community
  • 1
  • 1
Grundlefleck
  • 124,925
  • 25
  • 94
  • 111

2 Answers2

4

I have worked against similar requirements, and have some suggestions. Alot of this depends on what is possible with your data. It is difficult to make every case imaginable quick, but you can optimize for the common cases and have enough hardware grunt available for the others.

Summarise

We create summaries on a daily, weekly and monthly basis. For us, most of the transactions happen in the current day. Old transactions can also change. We keep a batch and under this the individual transaction records. Each batch has a status to indicate if the transaction summary (in table batch_summary) can be used. If an old transaction in a summarised batch changes, as part of this transaction the batch is flagged to indicate that the summary is not to be trusted. A background job will re-calculate the summary later.

Our software then uses the summary when possible and falls back to the individual transactions where there is no summary.

We played around with Oracle's materialized views, but ended up rolling our own summary process.

Limit the Requirements

Your requirements sound very wide. There can be a temptation to put all the query fields on a web page and let the users choose any combination of fields and output results. This makes it very difficult to optimize. I would suggest digging deeper into what they actually need to do, or have done in the past. It may not make sense to query on very unselective dimensions.

In our application for certain queries it is to limit the date range to not more than 1 month. We have in aligned some features to the date-based summaries. e.g. you can get results for the whole of Jan 2011, but not 5-20 Jan 2011.

Provide User Interface Feedback for Slow Operations

On occasions we have found it difficult to optimize some things to be shorter than a few minutes. We shirt a job off to a background server rather than have a very slow loading web page. The user can fire off a request and go about their business while we get the answer.

WW.
  • 23,793
  • 13
  • 94
  • 121
1

I would suggest using Materialized Views. Materialized Views allow you to store a View as you would a table. Thus all of the complex queries you need to have done are pre-calculated before the user queries them.

The tricky part is of course updating the Materialized View when the tables it is based on change. There's a nice article about it here: Update materialized view when urderlying tables change.

Materialized Views are not (yet) available without plugins in MySQL and are horribly complicated to implement otherwise. However, since you have Oracle I would suggest checking out the link above for how to add a Materialized View in Oracle.

Community
  • 1
  • 1
PyKing
  • 2,517
  • 1
  • 16
  • 5
  • Thanks for your answer. Unfortunately the sketchy support across database vendors is a problem, since we're actively supporting both Oracle and MySQL. – Grundlefleck Aug 26 '11 at 09:35