190

Imagine the following table (called TestTable):

id     somedate    somevalue
--     --------    ---------
45     01/Jan/09   3
23     08/Jan/09   5
12     02/Feb/09   0
77     14/Feb/09   7
39     20/Feb/09   34
33     02/Mar/09   6

I would like a query that returns a running total in date order, like:

id     somedate    somevalue  runningtotal
--     --------    ---------  ------------
45     01/Jan/09   3          3
23     08/Jan/09   5          8
12     02/Feb/09   0          8
77     14/Feb/09   7          15  
39     20/Feb/09   34         49
33     02/Mar/09   6          55

I know there are various ways of doing this in SQL Server 2000 / 2005 / 2008.

I am particularly interested in this sort of method that uses the aggregating-set-statement trick:

INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal) 
   SELECT id, somedate, somevalue, null
   FROM TestTable
   ORDER BY somedate

DECLARE @RunningTotal int
SET @RunningTotal = 0

UPDATE @AnotherTbl
SET @RunningTotal = runningtotal = @RunningTotal + somevalue
FROM @AnotherTbl

... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the UPDATE statement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.

But maybe there are other ways that people can suggest?

edit: Now with a SqlFiddle with the setup and the 'update trick' example above

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
codeulike
  • 22,514
  • 29
  • 120
  • 167

15 Answers15

158

Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947

The problem is that the SQL Server implementation of the Over clause is somewhat limited.

Oracle (and ANSI-SQL) allow you to do things like:

 SELECT somedate, somevalue,
  SUM(somevalue) OVER(ORDER BY somedate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM Table

SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.

The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):

Full working sample:

drop table #t 
create table #t ( ord int primary key, total int, running_total int)

insert #t(ord,total)  values (2,20)
-- notice the malicious re-ordering 
insert #t(ord,total) values (1,10)
insert #t(ord,total)  values (3,10)
insert #t(ord,total)  values (4,1)

declare @total int 
set @total = 0
update #t set running_total = @total, @total = @total + total 

select * from #t
order by ord 

ord         total       running_total
----------- ----------- -------------
1           10          10
2           20          30
3           10          40
4           1           41

You asked for a benchmark this is the lowdown.

The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.

The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.

Bottom line, for production code I would go with the cursor.

Test data:

create table #t ( ord int primary key, total int, running_total int)

set nocount on 
declare @i int
set @i = 0 
begin tran
while @i < 10000
begin
   insert #t (ord, total) values (@i,  rand() * 100) 
    set @i = @i +1
end
commit

Test 1:

SELECT ord,total, 
    (SELECT SUM(total) 
        FROM #t b 
        WHERE b.ord <= a.ord) AS b 
FROM #t a

-- CPU 11731, Reads 154934, Duration 11135 

Test 2:

SELECT a.ord, a.total, SUM(b.total) AS RunningTotal 
FROM #t a CROSS JOIN #t b 
WHERE (b.ord <= a.ord) 
GROUP BY a.ord,a.total 
ORDER BY a.ord

-- CPU 16053, Reads 154935, Duration 4647

Test 3:

DECLARE @TotalTable table(ord int primary key, total int, running_total int)

DECLARE forward_cursor CURSOR FAST_FORWARD 
FOR 
SELECT ord, total
FROM #t 
ORDER BY ord


OPEN forward_cursor 

DECLARE @running_total int, 
    @ord int, 
    @total int
SET @running_total = 0

FETCH NEXT FROM forward_cursor INTO @ord, @total 
WHILE (@@FETCH_STATUS = 0)
BEGIN
     SET @running_total = @running_total + @total
     INSERT @TotalTable VALUES(@ord, @total, @running_total)
     FETCH NEXT FROM forward_cursor INTO @ord, @total 
END

CLOSE forward_cursor
DEALLOCATE forward_cursor

SELECT * FROM @TotalTable

-- CPU 359, Reads 30392, Duration 496

Test 4:

declare @total int 
set @total = 0
update #t set running_total = @total, @total = @total + total 

select * from #t

-- CPU 0, Reads 58, Duration 139
Community
  • 1
  • 1
Sam Saffron
  • 128,308
  • 78
  • 326
  • 506
  • 1
    Thanks. So your code sample is to demonstrate that it will sum in the order of the primary key, I presume. It would be interesting to know if cursors are still more efficient than joins for larger data sets. – codeulike May 14 '09 at 08:37
  • I'm sure you can find stats on google or your favorite search engine.. but the quick answer is that set-based operations are much faster than cursor operations when working with large sets of data. – Sam Axe May 14 '09 at 18:15
  • I'm not overly familiar with cursors but are there any circumstances in which this would be a better option than a [recursive CTE](http://stackoverflow.com/questions/7357516/subquery-or-leftjoin-with-group-by-which-one-is-faster/7454564#7454564)? Both seem to operate similarly in terms of seeking in and getting the "next" row but the CTE avoids the need to populate and select from an intermediate table. – Martin Smith Sep 18 '11 at 14:23
  • 1
    I just tested the CTE @Martin, nothing comes close to the update trick - cursor seems lower on reads. Here is a profiler trace http://i.stack.imgur.com/BbZq3.png – Sam Saffron Sep 19 '11 at 00:42
  • @Sam - Yep. The Recursive CTE reads get massively boosted by the index spool that stores the output of the previous row then is read from again in the next iteration. It seems to count 6 worktable reads per source row from the output of `STATISTICS IO` on (table in my previous link). Not sure why 6 though. – Martin Smith Sep 19 '11 at 01:17
  • 3
    @Martin Denali is going to have a pretty nice solution for this http://msdn.microsoft.com/en-us/library/ms189461(v=SQL.110).aspx – Sam Saffron Sep 19 '11 at 02:18
  • @Sam - Yep. the extended `OVER` clause should make cursors pretty much redundant for everything except maintenance scripts I would have thought. BTW just remembered that logical reads are reported differently for worktables as per the answer to one of my questions [here](http://stackoverflow.com/questions/4230838/why-are-logical-reads-for-windowed-aggregate-functions-so-high/5194902#5194902) – Martin Smith Sep 19 '11 at 09:48
  • 1
    +1 for all the work put into this answer - I love the UPDATE option; can a partition be built into this UPDATE script? e.g if there was an additional field "Car Colour" could this script return running totals within each "Car Colour" partition? – whytheq Aug 10 '12 at 14:02
  • +1, nice answer. It's pity we cannot specify order into `update` option, and without it it's dangerous to put into production code – Roman Pekar Nov 27 '12 at 11:14
  • @RomanPekar I would have a check to see if denali has support for anything more fancy here, they beefed up the windowing stuff – Sam Saffron Nov 27 '12 at 21:42
  • @SamSaffron thanks, but unfortunately, my production server is 2008 R2 so I cannot use this awesome windowing functions in real world :( – Roman Pekar Nov 28 '12 at 06:48
  • @SamSaffron in the article here - http://weblogs.sqlteam.com/peterl/archive/2008/05/13/Lightning-fast-collapsed-date-ranges-and-missing-date-ranges.aspx this `update` is called`documented trick - the Clustered Index Update`. So I think that order could be specified by creating clustered index by ord. That said, I'd like to actually see real documentation about this `trick` before putting this into production code – Roman Pekar Nov 13 '14 at 11:39
  • @SamSaffron your solution is a widely useful specially in running total balance, the user may change the master key to get more performance, Test 4: is the best one. – Marwan Almukh Oct 28 '17 at 05:27
  • 2
    the initial (Oracle (and ANSI-SQL) ) answer now works in SQL server 2017. Thank you, very elegant! – DaniDev Sep 24 '19 at 17:36
  • Jeff Moden has a couple of big writeups on SQLServerCentral.com (https://www.sqlservercentral.com/articles/solving-the-running-total-and-ordinal-rank-problems-rewritten) about using the "quirky update" method in SQL Server/Sybase - it is the fastest, but does require one to do things to ensure you have a MAXDOP=1, etc. to avoid any parallelisms. His articles demonstrate that Cursors are still slow w.r.t. this. The new additions to the windowed partition functions are almost as performant as the quirky-update method, as others have also noted. – user1390375 Apr 11 '23 at 23:08
137

In SQL Server 2012 you can use SUM() with the OVER() clause.

select id,
       somedate,
       somevalue,
       sum(somevalue) over(order by somedate rows unbounded preceding) as runningtotal
from TestTable

SQL Fiddle

Mikael Eriksson
  • 136,425
  • 22
  • 210
  • 281
40

While Sam Saffron did great work on it, he still didn't provide recursive common table expression code for this problem. And for us who working with SQL Server 2008 R2 and not Denali, it's still fastest way to get running total, it's about 10 times faster than cursor on my work computer for 100000 rows, and it's also inline query.
So, here it is (I'm supposing that there's an ord column in the table and it's sequential number without gaps, for fast processing there also should be unique constraint on this number):

;with 
CTE_RunningTotal
as
(
    select T.ord, T.total, T.total as running_total
    from #t as T
    where T.ord = 0
    union all
    select T.ord, T.total, T.total + C.running_total as running_total
    from CTE_RunningTotal as C
        inner join #t as T on T.ord = C.ord + 1
)
select C.ord, C.total, C.running_total
from CTE_RunningTotal as C
option (maxrecursion 0)

-- CPU 140, Reads 110014, Duration 132

sql fiddle demo

update I also was curious about this update with variable or quirky update. So usually it works ok, but how we can be sure that it works every time? well, here's a little trick (found it here - http://www.sqlservercentral.com/Forums/Topic802558-203-21.aspx#bm981258) - you just check current and previous ord and use 1/0 assignment in case they are different from what you expecting:

declare @total int, @ord int

select @total = 0, @ord = -1

update #t set
    @total = @total + total,
    @ord = case when ord <> @ord + 1 then 1/0 else ord end,
    ------------------------
    running_total = @total

select * from #t

-- CPU 0, Reads 58, Duration 139

From what I've seen if you have proper clustered index/primary key on your table (in our case it would be index by ord_id) update will proceed in a linear way all the time (never encountered divide by zero). That said, it's up to you to decide if you want to use it in production code :)

update 2 I'm linking this answer, cause it includes some useful info about unreliability of the quirky update - nvarchar concatenation / index / nvarchar(max) inexplicable behavior.

Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
  • there should be a sequential number so you can join on ord = ord + 1 and sometimes it needs a little more work. But anyway, on SQL 2008 R2 I'm using this solution – Roman Pekar Jun 24 '13 at 10:21
  • +1 On SQLServer2008R2 I also prefer approach with recursive CTE. FYI, in order to find the value for the tables, which allow gaps I use a correlated sub-query. It adds two additional seek operations to the query http://sqlfiddle.com/#!3/d41d8/18967 – Aleksandr Fedorenko Aug 18 '13 at 14:08
  • Yes, it could be made with apply query, but this one is very fast, I couldn't say that about correlated subquery :( – Roman Pekar Aug 18 '13 at 14:13
  • 2
    For the case where you already have an ordinal for your data and you'r e looking for concise (non cursor) set based solution on SQL 2008 R2, this appears to be perfect. – Nick.Mc Feb 25 '14 at 01:03
  • 1
    Not every running total query will have an ordinal field that is contiguous. Sometimes a datetime field is what you have, or records have been deleted from the middle of the sort. That might be why it doesn't get used more often. – Reuben Oct 29 '14 at 04:17
  • 1
    @Reuben if your table is small enough, you always can dump it into temp table with sequential numbers, but yes, sometimes this solution could not be applied easily – Roman Pekar Oct 29 '14 at 07:55
  • 1
    @RomanPekar Just seen your 2014 update with the order check and divide by zero ejector seat, amazing idea! – codeulike Dec 08 '22 at 11:25
28

The APPLY operator in SQL 2005 and higher works for this:

select
    t.id ,
    t.somedate ,
    t.somevalue ,
    rt.runningTotal
from TestTable t
 cross apply (select sum(somevalue) as runningTotal
                from TestTable
                where somedate <= t.somedate
            ) as rt
order by t.somedate
Mike Forman
  • 4,367
  • 1
  • 21
  • 18
  • 5
    Works very well for smaller datasets. A downside is you'll have to have identical where clauses on the inner and outer query. – Sire Sep 12 '12 at 14:29
  • Since some of my dates were exactly the same (down to the fraction of a second) I had to add: row_number() over (order by txndate) to the inner and outer table and a few compound indices to make it run. Slick/simple solution. BTW, tested cross apply against subquery ... it's slightly faster. – pghcpa Jan 05 '15 at 04:39
  • this is very clean and does work well with small data sets; faster than the recursive CTE – jtate Jun 23 '16 at 15:41
  • this is nice solution as well (for small data sets), but you also have to be aware that it implies somedate column to be unique – Roman Pekar Sep 29 '16 at 08:04
11
SELECT TOP 25   amount, 
    (SELECT SUM(amount) 
    FROM time_detail b 
    WHERE b.time_detail_id <= a.time_detail_id) AS Total FROM time_detail a

You can also use the ROW_NUMBER() function and a temp table to create an arbitrary column to use in the comparison on the inner SELECT statement.

Sam Axe
  • 33,313
  • 9
  • 55
  • 89
  • 1
    This is really inefficient ... but then again there is no real clean way of doing this in sql server – Sam Saffron May 14 '09 at 00:36
  • Absolutely it is inefficient - but it does the job and there's no question of whether something for executed in the right or wrong order. – Sam Axe May 14 '09 at 06:35
  • thanks, its useful to have alternative answers, and also useful to have efficienty critique – codeulike May 14 '09 at 08:30
7

Use a correlated sub-query. Very simple, here you go:

SELECT 
somedate, 
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
GROUP BY somedate
ORDER BY somedate

The code might not be exactly correct, but I'm sure that the idea is.

The GROUP BY is in case a date appears more than once, you would only want to see it once in the result set.

If you don't mind seeing repeating dates, or you want to see the original value and id, then the following is what you want:

SELECT 
id,
somedate, 
somevalue,
(SELECT SUM(somevalue) FROM TestTable t2 WHERE t2.somedate<=t1.somedate) AS running_total
FROM TestTable t1
ORDER BY somedate
KthProg
  • 2,050
  • 1
  • 24
  • 32
  • Thanks... simple was great. There was an index to add for performance, but that was simple enough, (taking one of the recommendations from Database Engine Tuning Advisor ;), and then it ran like a shot. – Doug_Ivison Apr 28 '15 at 22:09
5

If you are using Sql server 2008 R2 above. Then, It would be shortest way to do;

Select id
    ,somedate
    ,somevalue,
LAG(runningtotal) OVER (ORDER BY somedate) + somevalue AS runningtotal
From TestTable 

LAG is use to get previous row value. You can do google for more info.

[1]:

AaA
  • 3,600
  • 8
  • 61
  • 86
shambhu yadav
  • 231
  • 2
  • 13
  • 1
    I believe [LAG](https://learn.microsoft.com/en-us/sql/t-sql/functions/lag-transact-sql) only exists in SQL server 2012 and above (not 2008) – AaA Oct 23 '17 at 04:35
  • 1
    Using LAG() does not improve on `SUM(somevalue) OVER(...)` which seems a lot cleaner to me – Paul Maxwell Oct 23 '17 at 06:45
5

You can also denormalize - store running totals in the same table:

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

Selects work much faster than any other solutions, but modifications may be slower

A-K
  • 16,804
  • 8
  • 54
  • 74
4

Though best way is to get it done will be using a window function, it can also be done using a simple correlated sub-query.

Select id, someday, somevalue, (select sum(somevalue) 
                                from testtable as t2
                                where t2.id = t1.id
                                and t2.someday <= t1.someday) as runningtotal
from testtable as t1
order by id,someday;
Krahul3
  • 37
  • 3
4

Here are 2 simple ways to calculate running total:

Approach 1: It can be written this way if your DBMS supports Analytical Functions

SELECT     id
           ,somedate
           ,somevalue
           ,runningtotal = SUM(somevalue) OVER (ORDER BY somedate ASC)
FROM       TestTable

Approach 2: You can make use of OUTER APPLY if your database version / DBMS itself does not support Analytical Functions

SELECT     T.id
           ,T.somedate
           ,T.somevalue
           ,runningtotal = OA.runningtotal
FROM       TestTable T
           OUTER APPLY (
                           SELECT   runningtotal = SUM(TI.somevalue)
                           FROM     TestTable TI
                           WHERE    TI.somedate <= S.somedate
                       ) OA;

Note:- If you have to calculate the running total for different partitions separately, it can be done as posted here: Calculating Running totals across rows and grouping by ID

san
  • 1,415
  • 8
  • 13
4

Assuming that windowing works on SQL Server 2008 like it does elsewhere (that I've tried), give this a go:

select testtable.*, sum(somevalue) over(order by somedate)
from testtable
order by somedate;

MSDN says it's available in SQL Server 2008 (and maybe 2005 as well?) but I don't have an instance to hand to try it.

EDIT: well, apparently SQL Server doesn't allow a window specification ("OVER(...)") without specifying "PARTITION BY" (dividing the result up into groups but not aggregating in quite the way GROUP BY does). Annoying-- the MSDN syntax reference suggests that its optional, but I only have SqlServer 2000 instances around at the moment.

The query I gave works in both Oracle 10.2.0.3.0 and PostgreSQL 8.4-beta. So tell MS to catch up ;)

araqnid
  • 127,052
  • 24
  • 157
  • 134
  • 2
    Using OVER with SUM will not work in this case to give a running total. The OVER clause does not accept ORDER BY when used with SUM. You have to use PARTITION BY, which will not work for running totals. – Sam Axe May 14 '09 at 00:23
  • thanks, its actually useful to hear why this wont work. araqnid maybe you could edit your answer to explain why its not an option – codeulike May 14 '09 at 08:33
  • [Coming in SQL Server 2011 apparently](http://twitter.com/#!/AdamMachanic/status/70868079046963200) – Martin Smith May 18 '11 at 21:08
  • This actually works for me, because I need to partition - so even though this isn't the most-popular answer, it is the easiest solution to my problem for RT in SQL. – William M-B Mar 13 '13 at 16:32
  • I don't have MSSQL 2008 with me, but I think you could probably partition by (select null) and hack around the partitioning problem. Or make a subselect with `1 partitionme` and partition by that. Also, partition by is probably needed in real life situations when doing reports. – nurettin Jul 19 '17 at 14:30
2

Using join Another variation is to use join. Now the query could look like:

    SELECT a.id, a.value, SUM(b.Value)FROM   RunTotalTestData a,
    RunTotalTestData b
    WHERE b.id <= a.id
    GROUP BY a.id, a.value 
    ORDER BY a.id;

for more you can visite this link http://askme.indianyouth.info/details/calculating-simple-running-totals-in-sql-server-12

Harikesh Yadav
  • 185
  • 1
  • 5
2

I believe a running total can be achieved using the simple INNER JOIN operation below.

SELECT
     ROW_NUMBER() OVER (ORDER BY SomeDate) AS OrderID
    ,rt.*
INTO
    #tmp
FROM
    (
        SELECT 45 AS ID, CAST('01-01-2009' AS DATETIME) AS SomeDate, 3 AS SomeValue
        UNION ALL
        SELECT 23, CAST('01-08-2009' AS DATETIME), 5
        UNION ALL
        SELECT 12, CAST('02-02-2009' AS DATETIME), 0
        UNION ALL
        SELECT 77, CAST('02-14-2009' AS DATETIME), 7
        UNION ALL
        SELECT 39, CAST('02-20-2009' AS DATETIME), 34
        UNION ALL
        SELECT 33, CAST('03-02-2009' AS DATETIME), 6
    ) rt

SELECT
     t1.ID
    ,t1.SomeDate
    ,t1.SomeValue
    ,SUM(t2.SomeValue) AS RunningTotal
FROM
    #tmp t1
    JOIN #tmp t2
        ON t2.OrderID <= t1.OrderID
GROUP BY
     t1.OrderID
    ,t1.ID
    ,t1.SomeDate
    ,t1.SomeValue
ORDER BY
    t1.OrderID

DROP TABLE #tmp
clevster
  • 21
  • 2
2

The following will produce the required results.

SELECT a.SomeDate,
       a.SomeValue,
       SUM(b.SomeValue) AS RunningTotal
FROM TestTable a
CROSS JOIN TestTable b
WHERE (b.SomeDate <= a.SomeDate) 
GROUP BY a.SomeDate,a.SomeValue
ORDER BY a.SomeDate,a.SomeValue

Having a clustered index on SomeDate will greatly improve the performance.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Dave Barker
  • 6,303
  • 2
  • 24
  • 25
  • @Dave I think this question is trying to find an efficient way of doing this, cross joining is going to be really slow for large sets – Sam Saffron May 14 '09 at 00:44
0
BEGIN TRAN
CREATE TABLE #Table (_Id INT IDENTITY(1,1) ,id INT ,    somedate VARCHAR(100) , somevalue INT)


INSERT INTO #Table ( id  ,    somedate  , somevalue  )
SELECT 45 , '01/Jan/09', 3 UNION ALL
SELECT 23 , '08/Jan/09', 5 UNION ALL
SELECT 12 , '02/Feb/09', 0 UNION ALL
SELECT 77 , '14/Feb/09', 7 UNION ALL
SELECT 39 , '20/Feb/09', 34 UNION ALL
SELECT 33 , '02/Mar/09', 6 

;WITH CTE ( _Id, id  ,  _somedate  , _somevalue ,_totvalue ) AS
(

 SELECT _Id , id  ,    somedate  , somevalue ,somevalue
 FROM #Table WHERE _id = 1
 UNION ALL
 SELECT #Table._Id , #Table.id  , somedate  , somevalue , somevalue + _totvalue
 FROM #Table,CTE 
 WHERE #Table._id > 1 AND CTE._Id = ( #Table._id-1 )
)

SELECT * FROM CTE

ROLLBACK TRAN
TT.
  • 15,774
  • 6
  • 47
  • 88
Mansoor
  • 4,061
  • 1
  • 17
  • 27
  • You should probably give some information as to what you are doing here, and note any advantages/disadvantages of this particular method. – TT. Nov 04 '16 at 07:17