1

I'm aware that there are several answers on SO about cumulative totals. I have experimented and have not found a solution to my problem.

Here is a sqlfiddle.

We have a contacts table with two fields, eid and create_time:

eid create_time
991772  April, 21 2016 11:34:21
989628  April, 17 2016 02:19:57
985557  April, 04 2016 09:56:39
981920  March, 30 2016 11:03:12
981111  March, 30 2016 09:36:48

I would like to select the number of new contacts in each month along with the size of our contacts database at the end of each month. New contacts by year and month is simple enough. For the size of the contacts table at the end of each month I did some research and found what looked to be a straight forwards method:

set @csum = 0;
select
   year(c.create_time) as yr,
   month(c.create_time) as mth,
   count(c.eid) as new_contacts,
   (@csum + count(c.eid)) as cumulative_contacts
from
  contacts c
group by
  yr,
  mth

That runs but gives me unexpected results.

If I run:

select count(*) from contacts where date(create_time) < current_date

I get the total number of records in the table 146.

I therefore expected the final row in my query using @csum to have 146 for April 2016. It has only 3?

What my goal is for field cumulative_contacts: For the record with e.g. January 2016.

select count(*) from contacts where date(create_time) < '2016-02-01';

And the record for February would have:

select count(*) from contacts where date(create_time) < '2016-03-01';

And so on

Community
  • 1
  • 1
Doug Fir
  • 19,971
  • 47
  • 169
  • 299

3 Answers3

1

Try this: fiddele

Here you have a "greater than or equal" join, so each group "contains" all previous values. Times 12 part, converts the hole comparation to months. I did offer this solution as it is not MySql dependant. (can be implemented on many other DBs with minimun or no changes)

select dates.yr, dates.mth, dates.new_contacts, sum(NC.new_contacts) as cumulative_new_contacts
from (
    select
       year(c.create_time) as yr,
       month(c.create_time) as mth,
       count(c.eid) as new_contacts
    from
      contacts c
    group by
      year(c.create_time),
      month(c.create_time)
) as dates
left join 
(
    select
       year(c.create_time) as yr,
       month(c.create_time) as mth,
       count(c.eid) as new_contacts
    from
      contacts c
    group by
      year(c.create_time),
      month(c.create_time)
) as NC
on dates.yr*12+dates.mth >= NC.yr*12+NC.mth
group by 
    dates.yr, 
    dates.mth,
    dates.new_contacts   -- not needed by MySql, present here for other DBs compatibility
order by 1,2
Saic Siquot
  • 6,513
  • 5
  • 34
  • 56
  • Thanks for taking the time to answer. I went with Reno's answer for no particularly good reason since I assume both do the job. I had to nit pick and found Reno's answer slightly more readable. I have never seen syntax like this dates.yr*12+dates.mth>=NC.yr*12+NC.mth - the times12 part.. what's that doing? Just curious? – Doug Fir Apr 28 '16 at 06:58
  • Edited to answer that. you can upvote if it was helpful or teach something to you – Saic Siquot Apr 28 '16 at 13:13
1

Try this, a bit of modification from your sql;)

CREATE TABLE IF NOT EXISTS `contacts` (
  `eid` char(50) DEFAULT NULL,
  `create_time` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;

INSERT INTO `contacts` (`eid`, `create_time`) VALUES
('991772', '2016-04-21 11:34:21'),
('989628', '2016-04-17 02:19:57'),
('985557', '2016-04-04 09:56:39'),
('981920', '2016-03-30 11:03:12'),
('981111', '2016-03-30 09:36:48');

SET @csum = 0;
SELECT t.*, @csum:=(@csum + new_contacts) AS cumulative_contacts
FROM (
    SELECT YEAR(c.create_time) AS yr, MONTH(c.create_time) AS mth, COUNT(c.eid) AS new_contacts
    FROM contacts c
    GROUP BY yr, mth) t

Output results is

| yr   | mth | new_contacts | cumulative_contacts | 
 ------ ----- -------------- ---------------------
| 2016 | 3   | 2            | 2                   | 
| 2016 | 4   | 3            | 5                   | 
Blank
  • 12,308
  • 1
  • 14
  • 32
  • Worked, thank you. I sorted the subquery in order of year, month asc too since otherwise this would fail. Thanks again – Doug Fir Apr 28 '16 at 06:59
1

This sql will get the cumulative sum and is pretty efficient. It numbers each row first and then uses that as the cumulative sum.

SELECT s1.yr, s1.mth, s1.new_contacts, s2.cummulative_contacts
FROM
   (SELECT
       YEAR(create_time) AS yr,
       MONTH(create_time) AS mth,
       COUNT(eid) AS new_contacts,
       MAX(eid) AS max_eid
    FROM
       contacts
    GROUP BY
       yr,
       mth
    ORDER BY create_time) s1 INNER JOIN
       (SELECT eid, (@sum:=@sum+1) AS cummulative_contacts
    FROM
       contacts INNER JOIN
       (SELECT @sum := 0) r
    ORDER BY create_time) s2 ON max_eid=s2.eid;

--Result sample--
|   yr | mth | new_contacts | cumulative_contacts |
|------|-----|--------------|---------------------|
| 2016 |   1 |            4 |                 132 |
| 2016 |   2 |            4 |                 136 |
| 2016 |   3 |            7 |                 143 |
| 2016 |   4 |            3 |                 146 |
  • I guess this would work if we assume that the contacts db and eid field are in order. Thanks for taking the time to answer – Doug Fir Apr 28 '16 at 06:58