Count the number of rows in 30 day bins

Question

Each row in my table has a date time stamp, and I wish to query the database from now, to count how many rows are in the last 30 days, the 30 days before that and so on. Until there is a 30 day bin going back to the start of the table.

I have successfully carried out this query by using Python and making several queries. But I'm almost certain that it can be done in one single MySQL query.

Explain with table structurer with some inut data and output data — Saharsh Shah, Jan 03 '13 at 17:57
what do you mean with "30 day bin" ? I think my answer should be okay, from what I can understand, but without some sample data I can't be sure it does exacty what you need — fthiella, Jan 04 '13 at 19:29
When you say _rolling_ 30 day bins, do you mean non-overlapping 30 day bins, or overlapping 30 days bins? For example, if the first bin is from 2012-12-02 to 2012-12-31, is the next bin from 2012-11-02 to 2012-11-01, or is it from 2012-12-01 to 2012-12-30? — Phil Frost, Jan 05 '13 at 02:26
Does anyone else think that "rolling" implies overlapping bins? I'm reluctant to propose an edit because I'm not sure if that understanding is universal, but I know it this way mostly from finance, as in "rolling returns". — Phil Frost, Jan 06 '13 at 13:36
@PhilFrost you're correct, it could be confusing. I've edited the title. — seanieb, Jan 07 '13 at 15:16
@seanieb accepted answer doesn't look much different from my second query, and i posted that answer immediately... i only mispelled a `d` which actually is a `date` ... but nvm :) — fthiella, Jan 07 '13 at 20:36
@fthiella Sorry, I overlooked your answer. I've started a new bounty and will award you an additional 100 points in 23 hrs. I'm going to keep Phils as the answer as it's very clearly done. — seanieb, Jan 08 '13 at 00:52
@seanieb thanks a lot! you didn't have to... but I appreciate it! thanks again! — fthiella, Jan 09 '13 at 19:13

fthiella · Answer 1 · 2013-01-04T19:53:28.773

If you just need to count intervals where there's at least one row, you could use this:

select
  datediff(curdate(), `date`) div 30 as block,
  count(*) as rows_per_block
from
  your_table
group by
  block

And this also shows the start date and the end date:

select
  datediff(curdate(), d) div 30 as block,
  date_sub(curdate(),
           INTERVAL (datediff(curdate(), `date`) div 30)*30 DAY) as start_block,
  date_sub(curdate(),
           INTERVAL (1+datediff(curdate(), `date`) div 30)*30-1 DAY) as end_block,
  count(*)
from your_table
group by block

but if you also need to show all intervals, you could use a solution like this:

select
  num,
  date_sub(curdate(),
           INTERVAL (num+1)*30-1 DAY) as start_block,
  date_sub(curdate(),
           INTERVAL num*30 DAY) as end_block,
  count(`date`)
from
  numbers left join your_table
  on `date` between date_sub(curdate(),
           INTERVAL (num+1)*30-1 DAY)  and
  date_sub(curdate(),
           INTERVAL num*30 DAY)
where num<=(datediff(curdate(), (select min(`date`) from your_table) ) div 30)
group by num

but this requires that you have a numbers table already prepared, or see fiddle here for a solution without numbers table.

I assume someone downvoted me based on my initial answer. The latter update seems to be more inline with what OP needs though. Please comment to improve if you see any flaws on my answer :) — bonCodigo, Dec 30 '12 at 13:09

score 3 · Accepted Answer · edited May 23 '17 at 12:03

No stored procedures, temporary tables, only one query, and an efficient execution plan given an index on the date column:

select

  subdate(
    '2012-12-31',
    floor(dateDiff('2012-12-31', dateStampColumn) / 30) * 30 + 30 - 1
  ) as "period starting",

  subdate(
    '2012-12-31',
    floor(dateDiff('2012-12-31', dateStampColumn) / 30) * 30
  ) as "period ending",

  count(*)

from
  YOURTABLE
group by floor(dateDiff('2012-12-31', dateStampColumn) / 30);

It should be pretty obvious what is happening here, except for this incantation:

floor(dateDiff('2012-12-31', dateStampColumn) / 30)

That expression appears several times, and it evaluates to the number of 30-day periods ago dateStampColumn is. dateDiff returns the difference in days, divide it by 30 to get it in 30-day periods, and feed it all to floor() to round it to an integer. Once we have this number, we can GROUP BY it, and further we do a bit of math to translate this number back into the starting and ending dates of the period.

Replace '2012-12-31' with now() if you prefer. Here's some sample data:

CREATE TABLE YOURTABLE
    (`Id` int, `dateStampColumn` datetime);

INSERT INTO YOURTABLE
    (`Id`, `dateStampColumn`)
VALUES
    (1, '2012-10-15 02:00:00'),
    (1, '2012-10-17 02:00:00'),
    (1, '2012-10-30 02:00:00'),
    (1, '2012-10-31 02:00:00'),
    (1, '2012-11-01 02:00:00'),
    (1, '2012-11-02 02:00:00'),
    (1, '2012-11-18 02:00:00'),
    (1, '2012-11-19 02:00:00'),
    (1, '2012-11-21 02:00:00'),
    (1, '2012-11-25 02:00:00'),
    (1, '2012-11-25 02:00:00'),
    (1, '2012-11-26 02:00:00'),
    (1, '2012-11-26 02:00:00'),
    (1, '2012-11-24 02:00:00'),
    (1, '2012-11-23 02:00:00'),
    (1, '2012-11-28 02:00:00'),
    (1, '2012-11-29 02:00:00'),
    (1, '2012-11-30 02:00:00'),
    (1, '2012-12-01 02:00:00'),
    (1, '2012-12-02 02:00:00'),
    (1, '2012-12-15 02:00:00'),
    (1, '2012-12-17 02:00:00'),
    (1, '2012-12-18 02:00:00'),
    (1, '2012-12-19 02:00:00'),
    (1, '2012-12-21 02:00:00'),
    (1, '2012-12-25 02:00:00'),
    (1, '2012-12-25 02:00:00'),
    (1, '2012-12-26 02:00:00'),
    (1, '2012-12-26 02:00:00'),
    (1, '2012-12-24 02:00:00'),
    (1, '2012-12-23 02:00:00'),
    (1, '2012-12-31 02:00:00'),
    (1, '2012-12-30 02:00:00'),
    (1, '2012-12-28 02:00:00'),
    (1, '2012-12-28 02:00:00'),
    (1, '2012-12-30 02:00:00');

And the result:

period starting     period ending   count(*)
2012-12-02          2012-12-31      17
2012-11-02          2012-12-01      14
2012-10-03          2012-11-01      5

period endpoints are inclusive.

Play with this in SQL Fiddle.

There's a bit of potential goofiness in that any 30 day period with zero matching rows will not be included in the result. If you could join this against a table of periods, that could be eliminated. However, MySQL doesn't have anything like PostgreSQL's generate_series(), so you'd have to deal with it in your application or try this clever hack.

score 2 · Answer 3 · answered Dec 30 '12 at 11:02

2

Try this:

SELECT 
  DATE_FORMAT(t1.`Date`, '%Y-%m-%d'),
  COUNT(t2.Id)
FROM 
(
  SELECT SUBDATE(CURDATE(), ID) `Date`
  FROM
  (
    SELECT  t2.digit * 10 + t1.digit + 1 AS id
    FROM         TEMP AS t1
    CROSS JOIN TEMP AS t2
  ) t 
  WHERE Id <= 30 
) t1
LEFT JOIN YOURTABLE t2 ON DATE(t1.`Date`) = DATE(t2.dateStampColumn)
GROUP BY t1.`Date`;

SQL Fiddle Demo

But, you will need to create a temp table Temp like so:

CREATE TABLE TEMP 
(Digit int);
INSERT INTO Temp VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

answered Dec 30 '12 at 11:02

Mahmoud Gamal

78,257
17
139
164

it is better to use Date() than Date_Format() so there is still no problem when comparing with date ranges :) eg Date(t1.Date) :) – John Woo Dec 30 '12 at 14:22
@JW. Yes, you are right. I just used it for formatting in the outpur. – Mahmoud Gamal Dec 31 '12 at 18:49
That can't be right, by the SQL Fiddle results. There are clearly plenty of 30-day windows in that data set with many matching rows, but you never show more than two. – Phil Frost Jan 05 '13 at 02:28
@PhilFrost Thanks for the critique. Can you please tell how this never show more than two? This query will give you the last previous 30 days from now, with the `COUNT(id)` for each days, and it will show the days that has no entries for this date in this table. Try this [**updated fiddle**](http://www.sqlfiddle.com/#!2/032b6/1), I just updated the sample data you will find the date `2012-12-06` has 6 entries in the sample data therefore it will have count = 6, if this what you meant. – Mahmoud Gamal Jan 05 '13 at 08:14
@MahmoudGamal: as I read the question, it's asking for the count of rows in a 30-day window, like between 2012-12-02 and 2012-12-31, but your query is counting just the number of rows for each (single) day. – Phil Frost Jan 05 '13 at 13:08
@PhilFrost Yes, the query will list all the days from curdate() to the previous 30 days, with the count for each day, Thats what I understood from the question. – Mahmoud Gamal Jan 05 '13 at 13:37

score 0 · Answer 4 · edited Dec 30 '12 at 13:50

Could you please try the following:

SELECT Count(*)
FROM
  yourtable
where
  dateColumn between Now() and Now() - Interval 30 Day

It needs some looping, for a better answer to isolote all 30 days intervals going back. As you also need a 30 day interval between min(Date) in the table and the last loop date :) Or to the least another table that carries the dates of each 30 day interval, and then join.

Here is getting count just by each calendar month. Not exactly what you need.

SELECT
  extract(month from datecolumn),
  count(*)
FROM
  yourtable
GROUP BY
  extract(month from datecolumn);

Given a thought to my latter comment and Stefan's comment, here is a long code yet with proper resutls. Based on my own sample data and compatible with MYSQL with interval. If you need to use with SQL Server please use DateADD or quivalent function.

SQLFIDDLE

Sample data:

ID_MAIN  FIELD1  FILTER
----------------------------------------
1        red     August, 05 2012 00:00:00+0000
2        blue    September, 15 2012 00:00:00+0000
3        pink    September, 20 2012 00:00:00+0000
4        blue    September, 27 2012 00:00:00+0000
5        blue    October, 02 2012 00:00:00+0000
6        blue    October, 16 2012 00:00:00+0000
7        blue    October, 22 2012 00:00:00+0000
8        pink    November, 12 2012 00:00:00+0000
9        pink    November, 28 2012 00:00:00+0000
10       pink    December, 01 2012 00:00:00+0000
11       pink    December, 08 2012 00:00:00+0000
12       pink    December, 22 2012 00:00:00+0000

Query:

set @i:= 0;
SELECT MIN(filter) INTO @mindt
FROM MAIN
;
select
  count(a.id_main),
  y.dateInterval,
  (y.dateInterval - interval 29 day) as lowerBound
from
  main a join (
    SELECT date_format(Now(),'%Y-%m-%d') as dateInterval
    from dual
    union all
    select x.dateInterval
    from (
      SELECT
        date_format(
          DATE(DATE_ADD(Now(),
                        INTERVAL @i:=@i-29 DAY)),'%Y-%m-%d') AS dateInterval
      FROM Main, (SELECT @i:=0) r
      HAVING datediff(dateInterval,@mindt) >= 30
      order by dateInterval desc) as x) as y
  on a.filter <= y.dateInterval 
     and a.filter > (y.dateInterval - interval 29 day)
group by y.dateInterval
order by y.dateInterval desc
;

Results:

COUNT(A.ID_MAIN)    DATEINTERVAL    LOWERBOUND
----------------------------------------------
2                   2012-12-30  2012-12-01
3                   2012-12-01  2012-11-02
2                   2012-11-02  2012-10-04
4                   2012-10-04  2012-09-05

NOW() should be replaced by some variable, because seanieb also wants to get the 30-day interval 60 days ago, 90 days ago and so on. — Stefan Neubert, Dec 30 '12 at 10:51
Thanks @Stefan Perhaps `Extract(month` could be a better idea ;) — bonCodigo, Dec 30 '12 at 10:58
Doing it by month is trivial. The question is explicitly about rolling 30 day buckets/intervals. — seanieb, Dec 30 '12 at 12:41
@seanieb since you haven't accepted Gamal's answer I have updated my post. — bonCodigo, Dec 30 '12 at 13:05
@bonCodigo i didn't downvote your answer, I only edited it to make it look better, but I get different intervals than yours: (0, '2012-12-01', '2012-12-30'), (1, '2012-11-01', '2012-11-30'), (2, '2012-10-02', '2012-10-31'), etc. so the counts don't match, and you are counting 11 rows but there actually are 12 — fthiella, Dec 30 '12 at 14:08
@fthiella I didn't point at you for the downvote :) Well my intervals are 29 days apart. Coz', `interval` counts first date and 29 days when adding up. By right, it shouldn't count August row, as it's not within a 30 day bin between minimum date(first record) and one after that. `2012-09-05 to 2012-08-05` are 31 days. — bonCodigo, Dec 30 '12 at 15:09
@bonCodigo :) i'm counting rows in blocks of 30 days, showing the counts for all blocks from the first date to the current date, but I am not 100% sure that this is what the OP is after... even the other answer returns something different... — fthiella, Dec 30 '12 at 17:13

Saharsh Shah · Answer 5 · 2013-01-08T13:05:57.050

Create a stored procedure to count number of rows by 30 days.

First run this procedure and then call the same procedure when you want to genrate data.

DELIMITER $$

DROP PROCEDURE IF EXISTS `sp_CountDataByDays`$$

CREATE DEFINER=`root`@`localhost` PROCEDURE `sp_CountDataByDays`()
BEGIN 
    CREATE TEMPORARY TABLE daterange (
            id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, 
            fromDate DATE, 
            toDate DATE, 
            PRIMARY KEY (`id`)
    ); 

    SELECT DATEDIFF(CURRENT_DATE(), dteCol) INTO @noOfDays 
    FROM yourTable ORDER BY dteCol LIMIT 1;

    SET @counter = -1;
    WHILE (@noOfDays > @counter) DO 
        INSERT daterange (toDate, fromDate) 
        VALUES (DATE_SUB(CURRENT_DATE(), INTERVAL @counter DAY), DATE_SUB(CURRENT_DATE(), INTERVAL @counter:=@counter + 30 DAY));
    END WHILE;

    SELECT d.id, d.fromdate, d.todate, COUNT(d.id) rowcnt 
    FROM daterange d  
    INNER JOIN yourTable a ON a.dteCol BETWEEN d.fromdate AND d.todate 
    GROUP BY d.id;

    DROP TABLE daterange;
END$$

DELIMITER ;

Then CALL the procedure:

CALL sp_CountDataByDays();

You get the output as below:

ID  From Date   To Date     Row Count
1   2012-12-06  2013-01-05  17668
2   2012-11-06  2012-12-06  2845
3   2012-10-07  2012-11-06  2276
4   2012-09-07  2012-10-07  4561
5   2012-08-08  2012-09-07  5415
6   2012-07-09  2012-08-08  8954
7   2012-06-09  2012-07-09  4387
8   2012-05-10  2012-06-09  7911
9   2012-04-10  2012-05-10  7935
10  2012-03-11  2012-04-10  2566

Count the number of rows in 30 day bins

5 Answers5

SQL Fiddle Demo

Linked