3

I'm sure this has been answered before, but I can't find it in the detail that I need.

For an analytics system I need to be able to group rows and return them on a chart, either grouped by minute, hour, day, month, or year. I have this working correctly (example code below).

SELECT COUNT( DISTINCT user_id ) ,  `hour` ,  `timestamp` 
FROM tracking_request
WHERE site_id =  '3'
AND  `timestamp` <  '2011-08-31 04:05:45'
AND  `timestamp` >  '2011-08-29 22:00:00'
GROUP BY  `hour` ,  `day` ,  `month` ,  `year` 
ORDER BY  `timestamp` ASC

The problem is, like most charts, I need to fill the gaps where the data isn't (eg. no rows for the last 3 minutes). I read about creating a "calendar table" and joining that data, but how can I do this efficiently for each scale (eg. year would be a lot easier then minute, as minute would require MANY rows in the table)? If it helps, there is a column for each in the table (like above, you can see there is "hour", "day", etc.)

EDIT:

I ended up using PHP to accoplish this by using an empty array and then filling it. If anyone could think of an all (or mostly) SQL solution for this that would be more awesome.

Coolist
  • 89
  • 4
  • 9
  • There is no point in "filling the gaps" when there's no data. Your graphing utility/API/whatever should be able to generate a chart regardless. If it can't, I suggest you look for another one. – NullUserException Aug 31 '11 at 04:10
  • What you are looking for is called "interpolation": http://stackoverflow.com/questions/5758800/mysql-interpolate-data – Daniel Pereira Aug 31 '11 at 04:19
  • @NullUserException That doesn't exactly help. If, for whatever reason, I just have the values in order and have the last 30 days on a graph, I need to fill those values without providing the graphing 'utility' the date/time. >Daniel Pereira Thanks, though that article doesn't explain how to make an efficient calendar table for this situation. – Coolist Aug 31 '11 at 04:37
  • @Coolist I don't get it. So you have values in order but you don't know what their dates are? – NullUserException Aug 31 '11 at 04:38
  • @NullUserException No, I have the dates but the graph can't just show "Aug 10 - 5 Views", "Aug 12 - 6 Views", "Aug 15 - 2 Views". It needs to have the dates inbetween returned (eg. "Aug 10 - 5 Views", "Aug 11 - 0 Views", "Aug 12 - 6 Views"), etc). If you're asking why the graphing utility doesn't take both the date and the values, and then fill in the gap automatically, it requires the x (value) and y (date) data. – Coolist Aug 31 '11 at 04:48

2 Answers2

2

In this answer I will outline how to generate your calendar tables.

Create three tables for days, hours and minutes:

CREATE TABLE days (
  day DATE,
  PRIMARY KEY (day)
) 
CREATE TABLE hours (
  hour INT,
  PRIMARY KEY (hour)
) 
CREATE TABLE minutes (
  minute INT,
  PRIMARY KEY (minute)
) 

Fill the hours table with the numbers from 0 to 23 and the minutes table with the numbers from 0 to 59. To fill the days table, you can create a procedure like the following:

CREATE PROCEDURE make_days(IN start_date DATE, IN end_date DATE)
BEGIN
  DECLARE curr_date DATE;
  SET curr_date = start_date;
  WHILE curr_date <= end_date DO
    INSERT IGNORE INTO days(day)  VALUES(curr_date);
    SET curr_date = DATE_ADD(curr_date, INTERVAL 1 DAY);
  END WHILE;
END

You can then call this procedure to create days like this:

CALL make_days('2011-01-01','2012-12-31');

Now you can create values for every minute in a given time interval using a query similar to the following:

SELECT YEAR(day) AS year, MONTH(day) AS month, DAYOFMONTH(day) AS day, hour, minute
FROM days, hours, minutes
WHERE CAST(CONCAT(day,' ',hour,':',minute) AS DATETIME) BETWEEN '2011-08-31 22:00' AND '2011-09-01 10:00'
ORDER BY year, month, day, hour, minute
Jakob Egger
  • 11,981
  • 4
  • 38
  • 48
  • Thanks, that was really helpful and well explained. I assume one would have to make the days every so often so the table doesn't run out, would it be better just to fill it from the start until a date far away? – Coolist Sep 01 '11 at 06:00
  • 1
    I recently found myself debugging a website I originally created 10 years ago, because at the time I thought: "Surely 5 digits will suffice for the record number!". Turns out that putting arbitrary limits in your code is a bad idea. But I'm sure that if you just insert all days for the next 15 years you'll be fine ;) – Jakob Egger Sep 01 '11 at 07:53
  • This seemed an interesting solution. Though it seems that if the user has a read-only access to the database, this solution does not work. I wonder if there's another option available to do this? –  Jun 18 '12 at 07:21
1

This is typically done in data warehouses by having a date dimension that has a list of all possible dates. You do an OUTER JOIN to the date dimension and COALESCE the null values to 0.

DrewM
  • 1,878
  • 14
  • 13