Quickly calculating running totals in sql server using set based operations

Question

I have some data that looks like this:

+---+--------+-------------+---------------+--------------+
|   |   A    |      B      |       C       |      D       |
+---+--------+-------------+---------------+--------------+
| 1 | row_id | disposal_id | excess_weight | total_weight |
| 2 | 1      | 1           | 0             | 30           |
| 3 | 2      | 1           | 10            | 30           |
| 4 | 3      | 1           | 0             | 30           |
| 5 | 4      | 2           | 5             | 50           |
| 6 | 5      | 2           | 0             | 50           |
| 7 | 6      | 2           | 15            | 50           |
| 8 | 7      | 2           | 5             | 50           |
| 9 | 8      | 2           | 5             | 50           |
+---+--------+-------------+---------------+--------------+

And I am transforming it to look like this:

+---+--------+-------------+---------------+--------------+
|   |   A    |      B      |       C       |      D       |
+---+--------+-------------+---------------+--------------+
| 1 | row_id | disposal_id | excess_weight | total_weight |
| 2 | 1      | 1           | 0             | 30           |
| 3 | 2      | 1           | 10            | 30           |
| 4 | 3      | 1           | 0             | 20           |
| 5 | 4      | 2           | 5             | 50           |
| 6 | 5      | 2           | 0             | 45           |
| 7 | 6      | 2           | 15            | 45           |
| 8 | 7      | 2           | 5             | 30           |
| 9 | 8      | 2           | 5             | 25           |
+---+--------+-------------+---------------+--------------+

Basically, I need to update the total_weight column by subtracting the sum of the excess_weights from previous rows in the table which belong to the same disposal_id.

I'm currently using a cursor because it's faster then other solutions I've tried (cte, triangular join, cross apply). My cursor solution keeps a running total that is reset to zero for each new disposal_id, increments it by the excess weight, and performs updates when needed and runs in about 40 seconds. The other solutions I've tried took anywhere from 3-5 minutes and I'm wondering if there is a relatively performant way to do this using set based operations?

I blogged about this recently: http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals — Aaron Bertrand, Jul 26 '12 at 11:27

score 2 · Accepted Answer · answered Aug 11 '11 at 03:18

I've spent a lot of time optimizing such queries, ended up with two performant options: either store precalculated running totals, as described in Denormalizing to enforce business rules: Running Totals, or calculate them on the client, which is also fast and easy.

score 1 · Answer 2 · edited May 23 '17 at 11:55

The other solution you probably already tried is to do something like the answers found here

Unless you are using Oracle, which has decent aggregates for cumulative sum, you're better off using a cursor. At best, you're going to have to rejoin the table to itself or use another methods for what should be a O(n) operation. In general, the set based solution for problems like these are messy or really messy.

score 0 · Answer 3 · answered Aug 11 '11 at 02:00

0

'previous rows' implies an ordering. so no - no set based operations there.

Oracle's LEAD and LAG are built for this, but SQL Server forces you into triangular joins... which i suppose you have investigated.

answered Aug 11 '11 at 02:00

Randy

16,480
1
37
55

Quickly calculating running totals in sql server using set based operations

3 Answers3