3

Is there a way in mysql to group by month, but with custom starting dates.

Say I want to count logins in a monthly basis, but with the condition that the month starts when a user register.

So for example user A registered on January 30th and user B on January 15th

I should group the logins as follow:

* User A: January 30th - February 28th, March 1st - March 30th, March 31 - April 30 and so on and so forth 
* User B: January 15th - February 14th, February 15th - March 14th and so on and so forth

I guess I need to use something like DATE_ADD('2013-01-30', INTERVAL 1 MONTH); but I can not seem to find a way to make the grouping.

UPDATE

@GarethD: You are right that was a typo

In general the month should start at the same day of the next month or the last day of the next month in case that the first is not possible, so if you registered in day 31, the month period would start in day 30 for months that does not have 31 days and the last day of February either 28 or 29

Example:

Given that

id 1 registered on 2012-12-16
id 2 registered on 2013-01-29

and the following table

+----+------------+
| id |    date    |
+----+------------+
|  1 | 2013-01-15 |
|  1 | 2013-01-16 |
|  1 | 2013-01-17 |
|  1 | 2013-01-17 |
|  2 | 2013-03-20 |
|  2 | 2013-03-21 |
|  2 | 2013-03-28 |
|  2 | 2013-03-29 |
|  2 | 2013-03-30 |
+----+------------+

the results should be

+----+----------------------------+-------+
| id |           range            | count |
+----+----------------------------+-------+
| 1  | 2012-12-16, 2013-01-15     |     1 |
| 1  | 2013-01-16, 2013-02-15     |     3 |
| 2  | 2013-02-2[8|9], 2013-03-28 |     3 |
| 2  | 2013-03-29, 2013-04-28     |     2 |
+----+----------------------------+-------+

I hope the intent is clearer now.

Cesar
  • 4,076
  • 8
  • 44
  • 68
  • Your example in itself is flawed, why does user A not go from March 1st to 30th, then 31st March to 30th April? i.e. maintaining the start date of the 30th with the exception of February that does not have 30 days? What would happen if user A registered on 28th January? Would it still go to 1st March to 31st March after 28th February? Can you provide more example data and an expected output? – GarethD Sep 04 '13 at 18:03
  • @GarethD, I've made some updates to the question, hope is clearer now. – Cesar Sep 04 '13 at 20:37

1 Answers1

4

For the following I am assuming you already have a numbers table, If you don't have a numbers table, then I'd recommend you make one then, but if you don't want to then you can create a number list on the fly

You can get a list of all boundaries by cross joining your userID and registered dates with your numbers table:

SELECT  u.ID, 
        DATE_ADD(RegisteredDate, INTERVAL n.Number MONTH) PeriodStart,
        DATE_ADD(RegisteredDate, INTERVAL n.Number + 1 MONTH) PeriodEnd
FROM    User u
        CROSS JOIN Numbers n;

This gives a table like:

ID  PERIODSTART     PERIODEND
1   2012-12-16      2012-12-16
2   2013-01-29      2013-01-29
1   2013-01-16      2013-01-16
2   2013-02-28      2013-02-28

Example on SQL Fiddle

You then need to join this to your main table, and do the count:

SELECT  u.ID,
        u.PeriodStart,
        DATE_ADD(PeriodEnd, INTERVAL -1 DAY) PeriodEnd,
        COUNT(*) AS `COUNT`
FROM    (   SELECT  u.ID, 
                    DATE_ADD(RegisteredDate, INTERVAL n.Number MONTH) PeriodStart,
                    DATE_ADD(RegisteredDate, INTERVAL n.Number + 1 MONTH) PeriodEnd
            FROM    User u
                    CROSS JOIN Numbers n
        ) u
        INNER JOIN T
            ON T.ID = u.ID
            AND T.Date >= u.PeriodStart
            AND T.Date < PeriodEnd
GROUP BY u.ID, u.PeriodStart, u.PeriodEnd;

Giving a final result of:

ID  PERIODSTART     PERIODEND   COUNT
1   2012-12-16      2013-01-15  1
1   2013-01-16      2013-02-15  3
2   2013-02-28      2013-03-28  3
2   2013-03-29      2013-04-28  2

Full Example on SQL-Fiddle

You can obviously concatenate your period start and end dates to make a 'range' string, but this is probably best handled in your application layer.

EDIT

This can be achieved with no subqueries which is likely to perform better:

SELECT  u.ID,
        DATE_ADD(u.RegisteredDate, INTERVAL n.Number MONTH) PeriodStart,
        DATE_ADD(DATE_ADD(u.RegisteredDate, INTERVAL n.Number + 1 MONTH), INTERVAL -1 DAY) PeriodEnd,
        COUNT(*) AS `COUNT`
 FROM   User u
        CROSS JOIN Numbers n 
        INNER JOIN T
            ON T.ID = u.ID
            AND T.Date >= DATE_ADD(u.RegisteredDate, INTERVAL n.Number MONTH)
            AND T.Date < DATE_ADD(u.RegisteredDate, INTERVAL n.Number + 1 MONTH)
GROUP BY u.ID, u.RegisteredDate, n.Number;

Example with no subquery on SQL-Fiddle

EDIT 2

This will get you all periods for all users up until the current period (i.e. where today falls within the date range)

SELECT  u.ID,
        DATE_ADD(u.RegisteredDate, INTERVAL n.Number MONTH) PeriodStart,
        DATE_ADD(DATE_ADD(u.RegisteredDate, INTERVAL n.Number + 1 MONTH), INTERVAL -1 DAY) PeriodEnd,
        COUNT(T.ID) AS `COUNT`
 FROM   User u
        CROSS JOIN Numbers n 
        LEFT JOIN T
            ON T.ID = u.ID
            AND T.Date >= DATE_ADD(u.RegisteredDate, INTERVAL n.Number MONTH)
            AND T.Date < DATE_ADD(u.RegisteredDate, INTERVAL n.Number + 1 MONTH)
WHERE   DATE_ADD(u.RegisteredDate, INTERVAL n.Number + 1 MONTH) <= CURRENT_TIMESTAMP
GROUP BY u.ID, u.RegisteredDate, n.Number;

Example on SQL Fiddle

Community
  • 1
  • 1
GarethD
  • 68,045
  • 10
  • 83
  • 123
  • 1
    thanks man, that was helpful, one question though, where do I learn this things? ;~) – Cesar Sep 05 '13 at 16:37
  • Ha, there is things we all can't do, unfortunately with programming you only realise what you don't know how to do when you need to do it. The more you do, the more you learn, and sooner or later you have seen most problems before, or at least similar ones. – GarethD Sep 05 '13 at 16:47
  • GarethD, one more question. Would it be too difficult to change this query to show a count of 0 for the intervals of the users who does not have entries in table `T`? – Cesar Sep 05 '13 at 17:04
  • No not really, change the join on 'T' to a LEFT JOIN, change the COUNT (*) to COUNT (T.ID). You may need to limit the results though as this will return a lot more results. I can't get to a computer now, but I'll update the answer tomorrow to show what I mean. – GarethD Sep 05 '13 at 17:48