3

I have a MySQL table "page" that contains fields "page_name" and "page_timestamp". The table stores page and times requests that were made on a website. I'm trying to write a query that gives me the most hit page for every 5 min period.

Output like

Time    Page           Hits 
12:00  index.html     34
12:05  page1.html     11
12:10  index.html     44

This is one attempt, but no output.

select pages_timestamp,
    (select  count( pages_name)
     from pages t2
     where UNIX_TIMESTAMP(t2.pages_timestamp) DIV 300 =
         UNIX_TIMESTAMP(pages_timestamp)
     group by pages_name
     order by count(pages_name) desc
     limit 1 )  
from pages
where ...
group by UNIX_TIMESTAMP(pages_timestamp ) DIV 300

I'm sure there is a better approach

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
pineoclean
  • 33
  • 3

1 Answers1

0

This assumes page_timestamp is a DATETIME type.

Here is a sample with some data I created, it helps to visualize the output.

So first off, lets create a function for time rounding, this will declutter our query:

drop function if exists rtime;
create function rtime (time DATETIME)
returns DATETIME

BEGIN
DECLARE newtime DATETIME;
set newtime = from_unixtime(floor(unix_timestamp(time)/300)*300);
return newtime;
END;

Next let's explore our data a little bit. I want to pull all Time, Page, and Count(Page_Name), grouping by interval and page.

SELECT
Time, Page, Hits 
from (
    select rtime(pages_timestamp) as time
    , page_name as page
    ,  count(page_name) as hits 
    from pages 
    group by page_name, rtime(pages_timestamp) 
    order by rtime(pages_timestamp), hits desc) g ;

This orders our aggregated table by time interval then the number of hits. Since our groups are ordered by most hits desc we can pull the first row. In MySQL we can non-aggregate non-group by columns (src. This gives us the first row per group, the one with the most hits. We just SELECT * from the above table grouping only by time:

select * from 
(select time, page, hits 
from (
     select rtime(pages_timestamp) as time
     , page_name as page
     ,  count(page_name) as hits 
     from pages 
     group by page_name, rtime(pages_timestamp) 
     order by rtime(pages_timestamp), hits desc
     ) g 
) h group by time;

Disclaimer!: If there is a tie, if two pages both have the most hits, this will only pull one record.

Community
  • 1
  • 1
EoinS
  • 5,405
  • 1
  • 19
  • 32
  • Would it be better to have a function for floor 5 min TIMESTAMP and use that for shorthand? Suggestions welcome – EoinS May 26 '16 at 05:10
  • `select SEC_TO_TIME( (ROUND(TIME_TO_SEC(user_log_pages_timestamp)/300)) * 300) , ( select count(user_log_pages_basename) from user_log_pages p where SEC_TO_TIME( (ROUND(TIME_TO_SEC(user_log_pages_timestamp)/300)) * 300) = < this part not working SEC_TO_TIME( (ROUND(TIME_TO_SEC(p.user_log_pages_timestamp)/300)) * 300) group by p.user_log_pages_basename order by count(user_log_pages_basename) desc limit 1 ) From user_log_pages GROUP BY SEC_TO_TIME( (ROUND(TIME_TO_SEC(user_log_pages_timestamp)/300)) * 300) Out 07:55:00 66 08:00:00 66` – pineoclean May 26 '16 at 07:29
  • working on this. Quick question, what type is your pages_timestamp? Is it datetime? Or is it stored as string like 12:01 etc? – EoinS May 26 '16 at 16:06
  • Your example works perfectly. Thank you so much. yes its a datetime field. – pineoclean May 27 '16 at 00:25