4

A database has a transactions table with columns: account_id, date, transaction_value (signed integer). Another table (account_value) stores the current total value of each account, which is the sum of all transaction_values per account. It is updated with a trigger on the transactions table (i.e., INSERTs, UPDATEs and DELETEs to transactions fire the trigger to change the account_value.)

A new requirement is to calculate the account's total transaction value only over the last 365 days. Only the current running total is required, not previous totals. This value will be requested often, almost as often as the account_value.

How would you implement this "sliding window sum" efficiently? A new table is ok. Is there a way to avoid summing over a year's range every time?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user1147339
  • 41
  • 1
  • 3

5 Answers5

2

This can be done with standard windowing functions:

SELECT account_id,
       sum(transaction_value) over (partition by account_id order by date)
FROM transactions

The order by inside the over() claues makes the sum a "sliding sum".

For the "only the last 356 days" you'd need a second query that will limit the rows in the WHERE clause.

The above works in PostgreSQL, Oracle, DB2 and (I think) Teradata. SQL Server does not support the order by in the window definition (the upcoming Denali version will AFAIK)

  • FYI, you're correct about OVER clause limitations in SQL Server with aggregate functions. Only PARTITION BY can be used – gbn Jan 13 '12 at 09:47
  • @gbn: you mean for SQL Server? –  Jan 13 '12 at 09:48
  • 1
    The OP currently seems to be maintaining a track of the running totals over time. It would seem to imply that reports being run may pull these running totals for multiple different days at a time. Such as viewing the changes in the running over a week, or month, or even year. – MatBailie Jan 13 '12 at 09:51
1

As simple as this?

SELECT
   SUM(transaction_value), account_id
FROM
   transactions t
WHERE
   -- SQL Server, Sybase       t.DATE >= DATEADD(year, -1, GETDATE())
   -- MySQL            t.DATE >= DATE_SUB(NOW(), INTERVAL 12 MONTH)
GROUP BY
   account_id;

You may want to remove the time component from the date expressions using DATE (MySQL) or this way in SQL Server

Community
  • 1
  • 1
gbn
  • 422,506
  • 82
  • 585
  • 676
  • 1
    Sybase ASE is the first as well :P – aF. Jan 13 '12 at 09:32
  • 1
    @aF. I think the OP wants a report showing the year_to_date running total for every day. So a report covering a year would have 365 different running totals over different 365 day ranges, and that's per account. It really is a headache, and a trigger based caching of the current values if often Much more efficient. – MatBailie Jan 13 '12 at 09:48
1

If queries of the transactions table are more frequent than inserts to the transactions table, then perhaps a view is the way to go?

jon
  • 5,986
  • 5
  • 28
  • 35
0

You are going to need a one-off script to populate the existing table with values for the preceding year for each existing record - that will need to run for the whole of the previous year for each record generated.

Once the rolling year column is populated, one alternative to summing the previous year would be to derive each new record's value as the previous record's rolling year value, plus the transaction value(s) since the last update, minus the transaction values between one year prior to the last update and one year ago from now.

I suggest trying both approaches against realistic test data to see which will perform better - I would expect summing the whole year to perform at least as well where data is relatively sparse, while the difference method may work better if data is to be frequently updated on each account.

0

I'll avoid any actual SQL here as it varies a lot depending on the variety of SQL that you are using.


You say that you have a trigger to maintain the existing running total.

I presume that it also (or perhaps a nightly process) creates new daily records in the account_value table. Then INSERTs, UPDATEs and DELETEs fire the trigger to add or subtract from the existing running total?

The only changes you need to make are:
- add a new field, "yearly_value" or something
- have the existing trigger update that in the same way as the existing field
- use gbn's type of answer to create today's records (or however far you backdate)
- but initialise each new daily record in a slightly different way...

When you insert a new row for a new day, it should be initialised to yesterday's value - the value 365 days ago. After that, the behavior should be identical to what you're already used to.

MatBailie
  • 83,401
  • 18
  • 103
  • 137