0

I wish to update a table with a cumulative sum. The update must happen in a particular order or else the cumulative value is wrong.

Consider a table Trans with the following columns:

TransID, TransDate, Amount, Balance

The TransDate column holds only the date and not the time. Thus, there may be many transactions on the same day and the correct order would then be determined by TransID.

Here is what I want to do:

Declare @Total as int = 0
Update Trans Set @Total=@Total+Amount,Balance=@Total Order By TransDate,TransID

The SQL doesn't work because of the Order By clause. If I remove the Order By clause, it would work only if all transactions had been entered in order.

I've tried searching other posts but couldn't find a satisfactory answer. My only other option is to create a SP and use a cursor to step through each transaction and update one by one.

Any ideas?

navigator
  • 1,678
  • 16
  • 29

2 Answers2

0

[edit3]

Doing this with a cursor is the best way I've found in SQL 2008 to do this. On my lowly XP 32-bit VM with 4GB of ram (SSD tho') I processed 500k records in 60 seconds.

Some tl;dr below ...

I keep coming back to this because I've often had to deal with it in the past but thought I didn't have enough time to explore better options beyond the first solution that executed AND got the correct answer.

I realize now that I should have taken the time. The subquery method I provide below is simple to write, but the absolute worst thing to do for performance.

Since cursors are also generally frowned upon, I tried a recursive CTE as the first alternative. It scales well to large recordsets, but there are two downsides:

  1. The max number of recursions is 32767. If you have 32768 records, you have to break the task into chunks and loop through them. Messy.
  2. The CTE method requires a sequentially-ordered key with no gaps. This never happens in production, so I had to roll my own, which required mucking around with a #tmp table, then updating back to the live table. Totally unnecessary, especially in light of point 1.

This is an example of where "avoid cursors at all cost" is not justified. SQL 2012 does have new methods, but I'm NOT going to run out and buy a Win 7 license just to try them out ...

[edit2]

I realized this morning that relying on TransDate and TransID to always be ordered in a sequential way is a Bad Idea. The real world is full of back-dated transactions and/or ID values which are not necessarily incremented sequentially. In such cases, the code in my original answer will break. So I fixed it as follows:

create table #tmp (
    TransID int identity(1,1) not null, 
    TransDate datetime not null, 
    Amount money null, 
    Balance money null,
    CompositeKey bigint null,
    primary key clustered (TransID)
);

insert into #tmp (TransDate, Amount)
select '2014-02-12', 100
union all
select '2014-02-12', 56
union all
select '2014-02-12', 38
union all
select '2014-02-12', 350
union all
select '2014-02-12', 980
union all
select '2014-02-13', 25
union all
select '2014-02-13', 80
union all
select '2014-02-13', 45
union all
select '2014-02-13', 269
union all
select '2014-02-11', 10000 -- this is an out-of-sequence record which breaks the original code

declare @maxID int = (select MAX(TransID) from #tmp)
set @maxID = power(10,LEN(@maxId))

update #tmp
set CompositeKey = CAST(TransDate as bigint) * @maxID + TransID

create nonclustered index IX_#tmp_CompositeKey
    on #tmp (CompositeKey);

update t1
set t1.Balance = t2.Balance
from #tmp as t1
left join (
    select t.TransID, t.TransDate, t.Amount, 
        (
            select sum(Amount) as Balance
            from #tmp as s
            --where s.TransDate <= t.TransDate and s.TransID <= t.TransID -- this gives an improper running balance
            where s.CompositeKey <= t.CompositeKey -- this gives the proper running balance
        ) as Balance
    from #tmp as t
) as t2
    on t1.TransDate = t2.TransDate and t1.TransId = t2.TransId

select *
from #tmp
order by TransDate, TransID

drop table #tmp

[begin original answer]

Top hit for "sql server running sum" on Google is:

Calculate a Running Total in SQL Server

It contains several in-depth treatments with notes on performance for each kind. Cursors is one method mentioned as higher performing, though I understand your reticence to use them. Are none of these methods sufficient?

[edit]

I've had luck with subqueries:

create table #tmp (
TransID int identity(1,1) not null, 
TransDate datetime not null, 
Amount money null, 
Balance money null,
primary key clustered (TransDate, TransID)
);

insert into #tmp (TransDate, Amount)
select '2014-02-12', 100
union all
select '2014-02-12', 56
union all
select '2014-02-12', 38
union all
select '2014-02-12', 350
union all
select '2014-02-12', 980
union all
select '2014-02-13', 25
union all
select '2014-02-13', 80
union all
select '2014-02-13', 45
union all
select '2014-02-13', 269
union all
select '2014-02-13', 42

update t1
set t1.Balance = t2.Balance
from #tmp as t1
left join (
    select t.TransID, t.TransDate, t.Amount, 
        (
            select sum(Amount) as Balance
            from #tmp as s
            where s.TransDate <= t.TransDate and s.TransID <= t.TransID
        ) as Balance
    from #tmp as t
) as t2
    on t1.TransDate = t2.TransDate and t1.TransId = t2.TransId

select *
from #tmp
order by TransDate, TransID

This is the easiest way I can think to write it. In large recordsets this method can bog down, hence my specific design of the primary key. In super large recordsets, for this kind of stuff I break it into chunks using an indexed field (TransDate would be an index candidate here) and then iterate through the updates using a cursor or dynamically generated SQL.

CTEs are another method popular on the referenced article, but writing recursive CTEs breaks my brain and the performance gain isn't at all clear to me.

Community
  • 1
  • 1
  • The solution recommends using a cursor... which is the next option I would consider. I want to know if there is a better way to do it within one statement... The other methods perform worse than the cursor. – navigator Feb 13 '14 at 09:18
0

Here's one method:

DECLARE @t table (
   transid   int identity(9,37)
 , transdate datetime
 , amount    decimal(15,4)
 , balance   decimal(15,4)
);

INSERT INTO @t (transdate, amount)
  VALUES ('2014-02-12', 100)
       , ('2014-02-12',  56)
       , ('2014-02-12',  38)
       , ('2014-02-12', 350)
       , ('2014-02-12', 980)
       , ('2014-02-13',  25)
       , ('2014-02-13',  80)
       , ('2014-02-13',  45)
       , ('2014-02-13', 269)
       , ('2014-02-13',  42)
;

; WITH x AS (
  SELECT transid
       , amount
       , Row_Number() OVER (ORDER BY transdate) As sequence
  FROM   @t
)
SELECT x.transid
     , x.amount
     , x.sequence
     , Sum(prev.amount) As running_sum
FROM   x
 LEFT
  JOIN x As prev
    ON prev.sequence <= x.sequence
GROUP
    BY x.transid
     , x.amount
     , x.sequence

The idea here is that you provide a sequential number to each row in the series using a windowed function - Row_Number(). This allows the ability to join to "previous" rows based on this sequencing.

If you remove the grouping you will notice that where x.sequence = 1 there is a single row. For x.sequence = 2 there are two rows (prev.sequence 1 and 2), and so on.

Because of this the method won't be very efficient as the result set gets larger.

gvee
  • 16,732
  • 35
  • 50