Multiple Running Totals with Group By

Question

I am struggling to find a good way to run running totals with a group by in it, or the equivalent. The below cursor based running total works on a complete table, but I would like to expand this to add a "Client" dimension. So I would get running totals as the below creates but for each company (ie Company A, Company B, Company C, etc.) in one table

CREATE TABLE test (tag int,  Checks float, AVG_COST float, Check_total float,  Check_amount float, Amount_total float, RunningTotal_Check float,  
 RunningTotal_Amount float)

DECLARE @tag int,
        @Checks float,
        @AVG_COST float,
        @check_total float,
        @Check_amount float,
        @amount_total float,
        @RunningTotal_Check float ,
        @RunningTotal_Check_PCT float,
        @RunningTotal_Amount float



SET @RunningTotal_Check = 0
SET @RunningTotal_Check_PCT = 0
SET @RunningTotal_Amount = 0
DECLARE aa_cursor CURSOR fast_forward
FOR
SELECT tag, Checks, AVG_COST, check_total, check_amount, amount_total
FROM test_3

OPEN aa_cursor
FETCH NEXT FROM aa_cursor INTO @tag,  @Checks, @AVG_COST, @check_total, @Check_amount, @amount_total
WHILE @@FETCH_STATUS = 0
 BEGIN
  SET @RunningTotal_CHeck = @RunningTotal_CHeck + @checks
  set @RunningTotal_Amount = @RunningTotal_Amount + @Check_amount
  INSERT test VALUES (@tag, @Checks, @AVG_COST, @check_total, @Check_amount, @amount_total,  @RunningTotal_check, @RunningTotal_Amount )
  FETCH NEXT FROM aa_cursor INTO @tag, @Checks, @AVG_COST, @check_total, @Check_amount, @amount_total
 END

CLOSE aa_cursor
DEALLOCATE aa_cursor

SELECT *, RunningTotal_Check/Check_total as CHECK_RUN_PCT, round((RunningTotal_Check/Check_total *100),0) as CHECK_PCT_BIN,  RunningTotal_Amount/Amount_total as Amount_RUN_PCT,  round((RunningTotal_Amount/Amount_total * 100),0) as Amount_PCT_BIN
into test_4
FROM test ORDER BY tag
create clustered index IX_TESTsdsdds3 on test_4(tag)

DROP TABLE test

----------------------------------

I can the the running total for any 1 company but I would like to do it for multiple to produce something like the results below.

CLIENT  COUNT   Running Total
Company A   1   6.7%
Company A   2   20.0%
Company A   3   40.0%
Company A   4   66.7%
Company A   5   100.0%
Company B   1   3.6%
Company B   2   10.7%
Company B   3   21.4%
Company B   4   35.7%
Company B   5   53.6%
Company B   6   75.0%
Company B   7   100.0%
Company C   1   3.6%
Company C   2   10.7%
Company C   3   21.4%
Company C   4   35.7%
Company C   5   53.6%
Company C   6   75.0%
Company C   7   100.0%

What version of SqlServer is this? – RBarryYoung Apr 28 '12 at 23:16 — RBarryYoung, Apr 28 '12 at 23:16

score 5 · Answer 1 · answered Apr 29 '12 at 00:11

This is finally simple to do in SQL Server 2012, where SUM and COUNT support OVER clauses that contain ORDER BY. Using Cris's #Checks table definition:

SELECT
  CompanyID,
  count(*) over (
    partition by CompanyID
    order by Cleared, ID
  ) as cnt,
  str(100.0*sum(Amount) over (
    partition by CompanyID
    order by Cleared, ID
  )/
  sum(Amount) over (
    partition by CompanyID
  ),5,1)+'%' as RunningTotalForThisCompany
FROM #Checks;

SQL Fiddle here.

+1 even though it's unlikely the OP is on SQL Server 2012. :-) — Aaron Bertrand, Apr 29 '12 at 01:29

Aaron Bertrand · Answer 2 · 2012-07-24T15:25:00.383

I originally started posting the SQL Server 2012 equivalent (since you didn't mention what version you were using). Steve has done a great job of showing the simplicity of this calculation in the newest version of SQL Server, so I'll focus on a few methods that work on earlier versions of SQL Server (back to 2005).

I'm going to take some liberties with your schema, since I can't figure out what all these #test and #test_3 and #test_4 temporary tables are supposed to represent. How about:

USE tempdb;
GO

CREATE TABLE dbo.Checks
(
  Client VARCHAR(32),
  CheckDate DATETIME,
  Amount DECIMAL(12,2)
);

INSERT dbo.Checks(Client, CheckDate, Amount)
          SELECT 'Company A', '20120101', 50
UNION ALL SELECT 'Company A', '20120102', 75
UNION ALL SELECT 'Company A', '20120103', 120
UNION ALL SELECT 'Company A', '20120104', 40
UNION ALL SELECT 'Company B', '20120101', 75
UNION ALL SELECT 'Company B', '20120105', 200
UNION ALL SELECT 'Company B', '20120107', 90;

Expected output in this case:

Client    Count Running Total
--------- ----- -------------
Company A 1     17.54
Company A 2     43.86
Company A 3     85.96
Company A 4     100.00
Company B 1     20.55
Company B 2     75.34
Company B 3     100.00

One way:

;WITH gt(Client, Totals) AS 
(
  SELECT Client, SUM(Amount) 
    FROM dbo.Checks AS c
    GROUP BY Client
), n (Client, Amount, rn) AS
(
  SELECT c.Client, c.Amount, 
    ROW_NUMBER() OVER  (PARTITION BY c.Client ORDER BY c.CheckDate)
    FROM dbo.Checks AS c
)
SELECT n.Client, [Count] = n.rn, 
  [Running Total] = CONVERT(DECIMAL(5,2), 100.0*(
    SELECT SUM(Amount) FROM n AS n2 
    WHERE Client = n.Client AND rn <= n.rn)/gt.Totals
 )
 FROM n INNER JOIN gt ON n.Client = gt.Client
 ORDER BY n.Client, n.rn;

A slightly faster alternative - more reads but shorter duration and simpler plan:

;WITH x(Client, CheckDate, rn, rt, gt) AS 
(
   SELECT Client, CheckDate, rn = ROW_NUMBER() OVER
   (PARTITION BY Client ORDER BY CheckDate),
    (SELECT SUM(Amount) FROM dbo.Checks WHERE Client = c.Client 
      AND CheckDate <= c.CheckDate),
    (SELECT SUM(Amount) FROM dbo.Checks WHERE Client = c.Client)
FROM dbo.Checks AS c
)
SELECT Client, [Count] = rn, 
  [Running Total] = CONVERT(DECIMAL(5,2), rt * 100.0/gt)
  FROM x
  ORDER BY Client, [Count];

While I've offered set-based alternatives here, in my experience I have observed that a cursor is often the fastest supported way to perform running totals. There are other methods such as the quirky update which perform about marginally faster but the result is not guaranteed. The set-based approach where you perform a self-join becomes more and more expensive as the source row counts go up - so what seems to perform okay in testing with a small table, as the table gets larger, the performance goes down.

I have a blog post almost fully prepared that goes through a slightly simpler performance comparison of various running totals approaches. It is simpler because it is not grouped and it only shows the totals, not the running total percentage. I hope to publish this post soon and will try to remember to update this space.

There is also another alternative to consider that doesn't require reading previous rows multiple times. It's a concept Hugo Kornelis describes as "set-based iteration." I don't recall where I first learned this technique, but it makes a lot of sense in some scenarios.

DECLARE @c TABLE
(
 Client VARCHAR(32), 
 CheckDate DATETIME,
 Amount DECIMAL(12,2),
 rn INT,
 rt DECIMAL(15,2)
);

INSERT @c SELECT Client, CheckDate, Amount,
  ROW_NUMBER() OVER (PARTITION BY Client
 ORDER BY CheckDate), 0
 FROM dbo.Checks;

DECLARE @i INT, @m INT;
SELECT @i = 2, @m = MAX(rn) FROM @c;

UPDATE @c SET rt = Amount WHERE rn = 1;

WHILE @i <= @m
BEGIN
    UPDATE c SET c.rt = c2.rt + c.Amount
      FROM @c AS c
      INNER JOIN @c AS c2
      ON c.rn = c2.rn + 1
      AND c.Client = c2.Client
      WHERE c.rn = @i;

    SET @i = @i + 1;
END

SELECT Client, [Count] = rn, [Running Total] = CONVERT(
  DECIMAL(5,2), rt*100.0 / (SELECT TOP 1 rt FROM @c
 WHERE Client = c.Client ORDER BY rn DESC)) FROM @c AS c;

While this does perform a loop, and everyone tells you that loops and cursors are bad, one gain with this method is that once the previous row's running total has been calculated, we only have to look at the previous row instead of summing all prior rows. The other gain is that in most cursor-based solutions you have to go through each client and then each check. In this case, you go through all clients' 1st checks once, then all clients' 2nd checks once. So instead of (client count * avg check count) iterations, we only do (max check count) iterations. This solution doesn't make much sense for the simple running totals example, but for the grouped running totals example it should be tested against the set-based solutions above. Not a chance it will beat Steve's approach, though, if you are on SQL Server 2012.

UPDATE

I've blogged about various running totals approaches here:

http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals

I made some comments about these approaches on [Tracy McKibben's blog](http://www.real-sql-guy.com/2012/02/20-abuses-procedurally-going-in-circles.html) in February. And yes, I was talking about the same blog post. :-( — Aaron Bertrand, Apr 29 '12 at 02:30
To be pedantic, the first two could be re-written as sub-queries without CTEs, but what's stopping them from working in SQL Server 2000 is ROW_NUMBER / OVER. — Aaron Bertrand, Apr 29 '12 at 13:36

score 0 · Answer 3 · answered Apr 28 '12 at 23:45

I didn't exactly understand the schema you were pulling from, but here is a quick query using a temp table that shows how to do a running total in a set based operation.

CREATE TABLE #Checks
(
     ID int IDENTITY(1,1) PRIMARY KEY
    ,CompanyID int NOT NULL
    ,Amount float NOT NULL
    ,Cleared datetime NOT NULL
)

INSERT INTO #Checks
VALUES
     (1,5,'4/1/12')
    ,(1,5,'4/2/12')
    ,(1,7,'4/5/12')
    ,(2,10,'4/3/12')

SELECT Info.ID, Info.CompanyID, Info.Amount, RunningTotal.Total, Info.Cleared
FROM
(
SELECT main.ID, SUM(other.Amount) as Total
FROM
    #Checks main
JOIN
    #Checks other
ON
    main.CompanyID = other.CompanyID
AND
    main.Cleared >= other.Cleared
GROUP BY
    main.ID) RunningTotal
JOIN
    #Checks Info
ON
    RunningTotal.ID = Info.ID

DROP TABLE #Checks

Multiple Running Totals with Group By

----------------------------------

3 Answers3

Linked