3

I am trying to return a resultset from MySql which is grouped by the YEAR and MONTH, and which has a count returned for every YEAR/MONTH..

Here is where I started:

SELECT YEAR(p.pEndDate) AS pYear, MONTHNAME(p.pEndDate) AS pMonth, count(*) AS pNum 
FROM projects p
WHERE p.status=3
GROUP BY YEAR(p.pEndDate), MONTH(p.pEndDate)

This SQL basically does 90% of what I need, except in the case that there is a month where the count is zero. For example, in 2009 July had zero projects with a status of 3, so I am getting:

2008    November    1
2009    January     2
2009    February    2
2009    March   2
2009    April   1
2009    May 2
2009    June    3
2009    August  2
2009    September   1
2009    October 1
2009    November    2
2009    December    1
2010    January 4
2010    February    1
2010    March   1
2010    April   3
2010    May 3
2010    June    3
2010    July    3
2010    August  3
2010    September   3
2010    October 2
2010    November    2
2010    December    3
2011    January 2
2011    February    1

Notice how July is just not there.

So I started doing some research with using another table to force the resultset to include July. So I created a new table 'monthTable' and added two columns monthID int Primary Key, monthName VARCHAR(3).

I've tried many different ways of using this table, starting with a RIGHT JOIN and so on.. none have them have yielded successful results, in fact almost everything I do yields the same result set as above.

Any help would be greatly appreciated!

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
Steve
  • 340
  • 4
  • 16
  • See the results for the "datetime-generation" tag -- I've answered various interpretations of the same question in the past. You need to get a list of dates that is not missing entries, and LEFT JOIN your data to it. – OMG Ponies Mar 01 '11 at 18:09
  • @OMG Ponies - this is what I have done with the monthTable, however, when I use left join on the monthTable and the projects table, the same results are returned. – Steve Mar 01 '11 at 18:20

3 Answers3

1

I've tried many different ways of using this [monthTable] table, starting with a RIGHT JOIN and so on.. none have them have yielded successful results, in fact almost everything I do yields the same result set as above.

FROM projects p WHERE p.status=3

My guess is that you were trying something like this

FROM projects p
RIGHT JOIN monthTable m on <join p to m>
WHERE p.status=3`

The problem is that the WHERE clause will be filtering out any record that doesn't have any p.status values (null). You need to move such filters to the JOIN clause, like this

FROM projects p
RIGHT JOIN monthTable m on <join p to m> AND p.status=3`

Curious, but how does a table like suffice, esp with monthName being only varchar(3)?

monthID int Primary Key, monthName VARCHAR(3).

Try creating it like this instead (one-off)

DROP PROCEDURE IF EXISTS FillMonthsTable;

delimiter //
CREATE PROCEDURE FillMonthsTable()
    LANGUAGE SQL
    NOT DETERMINISTIC
    CONTAINS SQL
    SQL SECURITY DEFINER
    COMMENT ''
BEGIN
  drop table if exists monthsTable;
  create table monthsTable (theYear int, theMonth int, monthName varchar(20));

  SET @x := date('2000-01-01');
  REPEAT 
    insert into monthsTable (theyear, themonth, monthname) SELECT year(@x), month(@x), monthname(@x);
    SET @x := date_add(@x, interval 1 month);
    UNTIL @x > date('2030-01-01') END REPEAT;
END//
delimiter ;

CALL FillMonthsTable;

DROP PROCEDURE FillMonthsTable;

Then using this query (1-pass to group your data, then a left join to produce the 0s)

SELECT m.theYear, m.theMonth, IFNULL(t.pNum, 0) theCount
FROM monthsTable m
LEFT JOIN (
    SELECT YEAR(p.pEndDate) AS pYear, MONTH(p.pEndDate) AS pMonth, count(*) AS pNum 
    FROM projects p
    WHERE p.status=3
    GROUP BY YEAR(p.pEndDate), MONTH(p.pEndDate)
) t on t.pYear = m.theYear and t.pMonth = m.theMonth
ORDER BY m.theYear, m.theMonth
RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262
  • thank you! A very good detailed answer, I'm afraid I can't +1 your reply as my rep is too low. But, thank you. i added the line WHERE m.theYear > '2007' AND m.theYear < '2012' to filter the results further. – Steve Mar 01 '11 at 19:14
0

Expanding on OMG Ponies statement, you need a Numbers or Tally table which includes a sequential list of integers that covers your months and years for all years on which you want to query.

Create Table Numbers ( Value int not null Primary Key )
Insert Numbers(Value) Values( 1 )
Insert Numbers(Value) Values( 2 )
...
Insert Numbers(Value) Values( 12 )
...
Insert Numbers(Value) Values( 2000 )
Insert Numbers(Value) Values( 2001 )
...
Insert Numbers(Value) Values( 2011 )
Insert Numbers(Value) Values( 2012 )

This would be a one-time insert and the table would remain static until such time as you needed more months or years. With that, we now Left Join your Projects table to the Numbers table:

Select Years.Value As PYear
    , Month_Name( Date_Add('2000-01-01', Interval Months.Value - 1 MONTH) ) As PMonth
    , Count( P.NonNullableCol ) As PNum
From Numbers As Months
    Cross Join Numbers As Years
    Left Join Projects As P
        On Year( P.PEnddate ) = Years.Value
            And Month( P.PEndDate ) = Months.Value
Where Months.Value Between 1 And 12
    And Years.Value Between 2008 And 2011
Group By Years.Value, Months.Value

Addition

Per comments, we are not told the nature of the underlying data. However, if the values in question were dates and not dates and times, then a faster approach would be a Calendar table rather than a Numbers table. Like a Numbers table, this would a static table of sequential dates covering the time period of dates in your Projects table.

Create Table Calendar ( DateValue date not null Primary Key )
Insert Calendar( DateValue ) Values( '2000-01-01' )
Insert Calendar( DateValue ) Values( '2000-01-02' )
Insert Calendar( DateValue ) Values( '2000-01-03' )
...
Insert Calendar( DateValue ) Values( '2011-03-01' )

Select Year( C.DateValue ) As PYear
    , Month( C.DateValue ) As PMonth
    , Count( P.NonNullableCol ) As PNum
From Calendar As C
    Left Join Projects As P
        On P.PEndDate = C.DateValue
Where C.DateValue Between '2008-11-01' And '2011-02-28'
Group By Year( C.DateValue ), Month( C.DateValue )
Thomas
  • 63,911
  • 12
  • 95
  • 141
  • That is a nasty query, with the YEAR(column) and MONTH(column) non-SARGable functions – RichardTheKiwi Mar 01 '11 at 18:36
  • @Richard aka cyberkiwi - With 36 rows it performs great ;->. If performance is an issue, the solution is to expand the Numbers table to be a Calendar table with all dates across the range and then group by Year and Month. – Thomas Mar 01 '11 at 18:41
  • Check the EXPLAIN plan. 36 rows as in output or 36 rows in table? The query you have will be attempting to join (and scan due to the functions) the two tables, and perform the GROUPing after the expensive join. I could be wrong but MySQL isn't smart with CROSS JOINs so you may get another expansion with that before it uses the WHERE clause. Group by on the base table takes a 1-pass scan and should be faster – RichardTheKiwi Mar 01 '11 at 18:52
  • @Richard aka cyberkiwi - As in 36 rows in the table. We aren't told the nature of the underlying data. If the data is dates and times, the solution is obviously different than if it is just dates. If MySQL is too dumb to realize that I'm only asking for 16 rows from the Numbers table, that yet another problem (which could be solved by switching products of course ;->). – Thomas Mar 01 '11 at 19:02
0

If you have an auxiliary table called nums with integers from 0 to 9, you can generate unbroken sequences of any type. Your issue is not that the count is null for a date value, it's that the date value doesn't exist at all. So say you want monthly counts between January 2004 and Mar 2006, you can create a temporary date list using the nums table like this:

SELECT DISTINCT ADDDATE('2004-01-01',INTERVAL i.i+j.i+k.i MONTH) AS mydate
FROM nums i JOIN nums j  JOIN nums k ORDER BY mydate LIMIT 27;

Then as described elsewhere you join your real data to the date list ON (year=year AND month=month).

Here's a similar query done on my own table (msds) for illustration:

select year(mydate) theyear, monthname(mydate) themonth, coalesce(c,0) thecount
from

(select DISTINCT adddate('2004-01-01',INTERVAL i.i+j.i+k.i MONTH) as mydate
FROM ints i  JOIN ints j  join ints k ORDER BY mydate LIMIT 27) datelist

left join

(SELECT year(issue_date) as y, month(issue_date) as m, count(*) c FROM msds m where issue_date between '2004-01-01' and '2006-03-01'
group by y, m) mydata

on (year(mydate)=y and month(mydate)=m)
dnagirl
  • 20,196
  • 13
  • 80
  • 123