4

I have a table look like below....

ID HID  Date          UID
 1  1   2012-01-01    1002
 2  1   2012-01-24    2005
 3  1   2012-02-15    5152
 4  2   2012-01-01    6252
 5  2   2012-01-19    10356
 6  3   2013-01-06    10989
 7  3   2013-03-25    25001
 8  3   2014-01-14    35798 

How can i group by HID, Year, Month and count(UID) and add a cumulative_sum (which is count of UID). So the final result look like this...

HID  Year   Month   Count  cumulative_sum
1   2012   01      2      2
1   2012   02      1      3
2   2012   01      2      2
3   2013   01      1      1
3   2013   03      1      2
3   2014   01      1      3   

What's the best way to accomplish this using query?

dCoder
  • 67
  • 2
  • 7
  • Is there a PRIMARY KEY? – Strawberry Sep 12 '14 at 11:28
  • possible duplicate of [Create a Cumulative Sum Column in MySQL](http://stackoverflow.com/questions/2563918/create-a-cumulative-sum-column-in-mysql) – unutbu Sep 12 '14 at 11:32
  • @Strawberry: sorry just update my initial table. ID is the primary key. I can write the basic query but struggling to produce above result. Any suggestion or solution with an example? – dCoder Sep 12 '14 at 11:37
  • @unutbu: I looked the link but it didn't help as accepted. – dCoder Sep 12 '14 at 11:38
  • @dCoder: You're right; that solution does not work here -- at least not "out-of-the-box". – unutbu Sep 12 '14 at 12:07

2 Answers2

5

I made assumptions about the original data set. You should be able to adapt this to the revised dataset - although note that the solution using variables (instead of my self-join) is faster...

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(ID INT NOT NULL  
,Date DATE NOT NULL          
,UID INT NOT NULL PRIMARY KEY
);

INSERT INTO my_table VALUES
(1   ,'2012-01-01',    1002),
(1   ,'2012-01-24',    2005),
(1   ,'2012-02-15',    5152),
(2   ,'2012-01-01',    6252),
(2   ,'2012-01-19',    10356),
(3   ,'2013-01-06',    10989),
(3   ,'2013-03-25',    25001),
(3   ,'2014-01-14',    35798);


SELECT a.*
     , SUM(b.count) cumulative 
  FROM
     (
       SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY  id,year,month
     ) a
  JOIN
     (
       SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY  id,year,month
     ) b
    ON b.id = a.id AND (b.year < a.year OR (b.year = a.year AND b.month <= a.month)
     )
 GROUP 
    BY a.id, a.year,a.month;
+----+------+-------+-------+------------+
| id | year | month | count | cumulative |
+----+------+-------+-------+------------+
|  1 | 2012 |     1 |     2 |          2 |
|  1 | 2012 |     2 |     1 |          3 |
|  2 | 2012 |     1 |     2 |          2 |
|  3 | 2013 |     1 |     1 |          1 |
|  3 | 2013 |     3 |     1 |          2 |
|  3 | 2014 |     1 |     1 |          3 |
+----+------+-------+-------+------------+

If you don't mind an extra column in the result, you can simplify (and accelerate) the above, as follows:

SELECT x.*
     , @running:= IF(@previous=x.id,@running,0)+x.count cumulative
     , @previous:=x.id
  FROM 
     ( SELECT x.id,YEAR(date) year,MONTH(date) month, COUNT(0) count FROM my_table x GROUP BY  id,year,month ) x
    ,( SELECT @cumulative := 0,@running:=0) vals;
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • Just another point to check, how to include null value '0' if any month doesn't have count(*)? Any idea. – dCoder Sep 15 '14 at 10:32
  • If you have some kind of presentation layer (e.g. PHP) then I'd handle the logic of missing results there. Otherwise you can construct an elaborate UNION for the missing the dates, or make a calendar table of all possble dates (depressingly small) and LEFT JOIN on to that. – Strawberry Sep 15 '14 at 11:02
  • 1
    I just noticed the 2 solution only work when i run the query again. Meaning it doesn't do cumulative when query run first time. is there any reason? or does it require any var to define? – dCoder Sep 18 '14 at 08:17
  • i tried using UNION for missing months to include '0' but unable to product results. Something needs to require an expert inputs for the above query. – dCoder Sep 19 '14 at 11:10
  • I am also facing the same issue. The query needs to be run twice and only on the second go it produces cummulative results. Any suggestions please help – Kousick Shanmugam Nagaraj Jan 24 '15 at 16:01
1

The code turns out kind of messy, and it reads as follows:

SELECT
    HID,
    strftime('%Y', `Date`) AS Year,
    strftime('%m', `Date`) AS Month,
    COUNT(UID) AS Count,
    (SELECT
        COUNT(UID)
    FROM your_db A
    WHERE
            A.HID=B.HID
        AND
                (strftime('%Y', A.`Date`) < strftime('%Y', B.`Date`)
            OR
                    (strftime('%Y', A.`Date`) = strftime('%Y', B.`Date`)
                AND
                    strftime('%m', A.`Date`) <= strftime('%m', B.`Date`)))) AS cumulative_count
FROM your_db B
GROUP BY HID, YEAR, MONTH

Though by using views, it should become much clearer:

CREATE VIEW temp_data AS SELECT
    HID,
    strftime('%Y', `Date`) as Year,
    strftime('%m', `Date`) as Month,
    COUNT(UID) as Count
FROM your_db GROUP BY HID, YEAR, MONTH;

Then your statement will read as follows:

SELECT
    HID,
    Year,
    Month,
    `Count`,
    (SELECT SUM(`Count`)
FROM temp_data A
WHERE
        A.HID = B.HID
    AND
            (A.Year < B.Year
        OR
                (A.Year = B.Year
            AND
                A.Month <= B.Month))) AS cumulative_sum
FROM temp_data B;
Hetzroni
  • 2,109
  • 1
  • 14
  • 29