[edit3]
Doing this with a cursor is the best way I've found in SQL 2008 to do this. On my lowly XP 32-bit VM with 4GB of ram (SSD tho') I processed 500k records in 60 seconds.
Some tl;dr below ...
I keep coming back to this because I've often had to deal with it in the past but thought I didn't have enough time to explore better options beyond the first solution that executed AND got the correct answer.
I realize now that I should have taken the time. The subquery method I provide below is simple to write, but the absolute worst thing to do for performance.
Since cursors are also generally frowned upon, I tried a recursive CTE as the first alternative. It scales well to large recordsets, but there are two downsides:
- The max number of recursions is 32767. If you have 32768 records, you have to break the task into chunks and loop through them. Messy.
- The CTE method requires a sequentially-ordered key with no gaps. This never happens in production, so I had to roll my own, which required mucking around with a
#tmp
table, then updating back to the live table. Totally unnecessary, especially in light of point 1.
This is an example of where "avoid cursors at all cost" is not justified. SQL 2012 does have new methods, but I'm NOT going to run out and buy a Win 7 license just to try them out ...
[edit2]
I realized this morning that relying on TransDate and TransID to always be ordered in a sequential way is a Bad Idea. The real world is full of back-dated transactions and/or ID values which are not necessarily incremented sequentially. In such cases, the code in my original answer will break. So I fixed it as follows:
create table #tmp (
TransID int identity(1,1) not null,
TransDate datetime not null,
Amount money null,
Balance money null,
CompositeKey bigint null,
primary key clustered (TransID)
);
insert into #tmp (TransDate, Amount)
select '2014-02-12', 100
union all
select '2014-02-12', 56
union all
select '2014-02-12', 38
union all
select '2014-02-12', 350
union all
select '2014-02-12', 980
union all
select '2014-02-13', 25
union all
select '2014-02-13', 80
union all
select '2014-02-13', 45
union all
select '2014-02-13', 269
union all
select '2014-02-11', 10000 -- this is an out-of-sequence record which breaks the original code
declare @maxID int = (select MAX(TransID) from #tmp)
set @maxID = power(10,LEN(@maxId))
update #tmp
set CompositeKey = CAST(TransDate as bigint) * @maxID + TransID
create nonclustered index IX_#tmp_CompositeKey
on #tmp (CompositeKey);
update t1
set t1.Balance = t2.Balance
from #tmp as t1
left join (
select t.TransID, t.TransDate, t.Amount,
(
select sum(Amount) as Balance
from #tmp as s
--where s.TransDate <= t.TransDate and s.TransID <= t.TransID -- this gives an improper running balance
where s.CompositeKey <= t.CompositeKey -- this gives the proper running balance
) as Balance
from #tmp as t
) as t2
on t1.TransDate = t2.TransDate and t1.TransId = t2.TransId
select *
from #tmp
order by TransDate, TransID
drop table #tmp
[begin original answer]
Top hit for "sql server running sum" on Google is:
Calculate a Running Total in SQL Server
It contains several in-depth treatments with notes on performance for each kind. Cursors is one method mentioned as higher performing, though I understand your reticence to use them. Are none of these methods sufficient?
[edit]
I've had luck with subqueries:
create table #tmp (
TransID int identity(1,1) not null,
TransDate datetime not null,
Amount money null,
Balance money null,
primary key clustered (TransDate, TransID)
);
insert into #tmp (TransDate, Amount)
select '2014-02-12', 100
union all
select '2014-02-12', 56
union all
select '2014-02-12', 38
union all
select '2014-02-12', 350
union all
select '2014-02-12', 980
union all
select '2014-02-13', 25
union all
select '2014-02-13', 80
union all
select '2014-02-13', 45
union all
select '2014-02-13', 269
union all
select '2014-02-13', 42
update t1
set t1.Balance = t2.Balance
from #tmp as t1
left join (
select t.TransID, t.TransDate, t.Amount,
(
select sum(Amount) as Balance
from #tmp as s
where s.TransDate <= t.TransDate and s.TransID <= t.TransID
) as Balance
from #tmp as t
) as t2
on t1.TransDate = t2.TransDate and t1.TransId = t2.TransId
select *
from #tmp
order by TransDate, TransID
This is the easiest way I can think to write it. In large recordsets this method can bog down, hence my specific design of the primary key. In super large recordsets, for this kind of stuff I break it into chunks using an indexed field (TransDate would be an index candidate here) and then iterate through the updates using a cursor or dynamically generated SQL.
CTEs are another method popular on the referenced article, but writing recursive CTEs breaks my brain and the performance gain isn't at all clear to me.