1

I have some data that looks like this:

+---+--------+-------------+---------------+--------------+
|   |   A    |      B      |       C       |      D       |
+---+--------+-------------+---------------+--------------+
| 1 | row_id | disposal_id | excess_weight | total_weight |
| 2 | 1      | 1           | 0             | 30           |
| 3 | 2      | 1           | 10            | 30           |
| 4 | 3      | 1           | 0             | 30           |
| 5 | 4      | 2           | 5             | 50           |
| 6 | 5      | 2           | 0             | 50           |
| 7 | 6      | 2           | 15            | 50           |
| 8 | 7      | 2           | 5             | 50           |
| 9 | 8      | 2           | 5             | 50           |
+---+--------+-------------+---------------+--------------+

And I am transforming it to look like this:

+---+--------+-------------+---------------+--------------+
|   |   A    |      B      |       C       |      D       |
+---+--------+-------------+---------------+--------------+
| 1 | row_id | disposal_id | excess_weight | total_weight |
| 2 | 1      | 1           | 0             | 30           |
| 3 | 2      | 1           | 10            | 30           |
| 4 | 3      | 1           | 0             | 20           |
| 5 | 4      | 2           | 5             | 50           |
| 6 | 5      | 2           | 0             | 45           |
| 7 | 6      | 2           | 15            | 45           |
| 8 | 7      | 2           | 5             | 30           |
| 9 | 8      | 2           | 5             | 25           |
+---+--------+-------------+---------------+--------------+

Basically, I need to update the total_weight column by subtracting the sum of the excess_weights from previous rows in the table which belong to the same disposal_id.

I'm currently using a cursor because it's faster then other solutions I've tried (cte, triangular join, cross apply). My cursor solution keeps a running total that is reset to zero for each new disposal_id, increments it by the excess weight, and performs updates when needed and runs in about 40 seconds. The other solutions I've tried took anywhere from 3-5 minutes and I'm wondering if there is a relatively performant way to do this using set based operations?

Nate
  • 13
  • 2

3 Answers3

2

I've spent a lot of time optimizing such queries, ended up with two performant options: either store precalculated running totals, as described in Denormalizing to enforce business rules: Running Totals, or calculate them on the client, which is also fast and easy.

A-K
  • 16,804
  • 8
  • 54
  • 74
1

The other solution you probably already tried is to do something like the answers found here

Unless you are using Oracle, which has decent aggregates for cumulative sum, you're better off using a cursor. At best, you're going to have to rejoin the table to itself or use another methods for what should be a O(n) operation. In general, the set based solution for problems like these are messy or really messy.

Community
  • 1
  • 1
dfb
  • 13,133
  • 2
  • 31
  • 52
0

'previous rows' implies an ordering. so no - no set based operations there.

Oracle's LEAD and LAG are built for this, but SQL Server forces you into triangular joins... which i suppose you have investigated.

Randy
  • 16,480
  • 1
  • 37
  • 55