0

I'm working on a query that produces a running sum of the size column grouped by created and type, and also considers the deleted date. So I don't mean just a sum for that date, but a running sum. If I put an apple in the bucket on Monday (total for Monday = 1). If I put another apple in on Tuesday (total for Tuesday = 2), ...
I'm just stumped on how to start.

Here's some sample data:

[id,type, size, created, deleted]  
1,A,    2,  2014-08-05, 2014-08-06  
2,A,    3,  2014-08-05,   
3,A,    5,  2014-08-06,       
4,A,    4,  2014-08-06,      
5,B,    2,  2014-08-06, 2014-08-06  
6,B,    4,  2014-08-07, 2014-08-07  
7,C,    6,  2014-08-07, 2014-08-07  
8,C,    4,  2014-08-07,       
9,D,    3,  2014-08-07,       
10,E,   5,  2014-08-07,       
11,C,   6,  2014-08-07,       

Results should look like this:

[Date, Type, Sum]  
2014-08-05,A,5  (sum of IDs 1,2)  
2014-08-06,A,14 (sum of IDs 1,2,3,4)  
2014-08-06,B,2  
2014-08-07,A,11 (sum of IDs 2,3,4: Notice that ID 1 is not included because it was deleted 2014-08-06)  
2014-08-07,B,6 (sum of IDs 5,6)  
2014-08-07,C,16 (sum of IDs 7,8,11)  
etc...
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
bopritchard
  • 369
  • 3
  • 15
  • 1
    `2014-08-06,A,14` should be `2014-08-06,A,12`, because of row 1. Or do you include the day a row is deleted? And `2014-08-07,A,11` is simply wrong. Please fix your question. – Erwin Brandstetter Aug 12 '14 at 22:26

1 Answers1

0
SELECT DISTINCT day, type
     , sum(size) OVER (PARTITION BY type ORDER BY day) AS running_total
FROM (
   SELECT created AS day, type, size
   FROM   tbl

   UNION  ALL
   SELECT deleted, type, size * -1
   FROM   tbl
   WHERE  deleted IS NOT NULL
   ) sub
ORDER BY 1,2;

Returns:

day         type  running_total
-------------------------------
2014-08-05  A     5
2014-08-06  A     12
2014-08-06  B     0
2014-08-07  B     0
2014-08-07  C     10
2014-08-07  D     3
2014-08-07  E     5

Explain:

  • Take every row and with numbers as given for (created, type).
  • Do the same for (deleted, type), where the row has been deleted.
  • Compute the running total with the window aggregate function sum() OVER ...
  • Since this can produce multiple identical rows per day, apply DISTINCT to fold dupes.

If deletion should only become effective the next day, simply add 1 to the deleted date:

...
   SELECT deleted + 1, type, size * -1
...

Faster

Same result, more verbose, but faster:

SELECT day, type
     , sum(daily_delta) OVER (PARTITION BY type ORDER BY day) AS running_total
FROM  (
    SELECT day, type, sum(size) AS daily_delta
    FROM (
        SELECT created AS day, type, size
        FROM   tbl

        UNION  ALL
        SELECT deleted, type, size * -1
        FROM   tbl
        WHERE  deleted IS NOT NULL
        ) sub
    GROUP  BY  1,2
    ) sub
ORDER BY 1,2;

Closely related answer with detailed explanation:

SQL Fiddle.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228