0

I have a table that tracks the activity in several websites. Each row is of the following form: (Date, Hour, Website, Hits) The Hour field is a number between 0 and 23 and represents an entire hour (for example, 22 is for any hits between 22:00 and 22:59).

I want to find the overall slowest hour for each website, meaning the input should be something like (Website, Hour).

In order to do that, I was thinking I should have a nested query to find the minimum hits for each website on each day, and then count the values of Hour (again, for each website on each day), and see which value is the maximal.

I'm still new to SQL so I'm having difficulties using the min() function properly, to find the minimal value only for a specific date and website. Then I have the same problem with using count() for a specific website.

I'm also curious if I can get not just the most common slowest hour, but maybe the 3 slowest, but at least to me it seems like it's really complicating the problem.

For the first nested query, I considered something like this:

SELECT DISTINCT Date Date_t, Website Website_t, Hour,
(SELECT min(Hits) from HITS_TABLE WHERE Date=Date_t and Website=Website_t) as MinHits
FROM HITS_TABLE

But not only it takes an abnormally long time to calculate, it also gives me multiple entries of (Date_t, Website_t, Hour, min(Hits)) for each value of Hour, so I take it that I'm not doing it in the smartest, nor the most efficient way.

Thanks in advance for any help!

1 Answers1

1

You can get the minimum hour using a trick in MySQL:

select website, substring_index(group_concat(hour order by hits), ',', 1) as minhour
from table t
group by website;

For each website, this constructs a comma-delimited list of hours, ordered by the number of hits. The function substring_index() returns the first row.

This is something of a hack. In most other databases, you would use window/analytic functions, but these are not available in MySQL.

EDIT:

You can do this in standard SQL as well:

select t.*
from table t
where not exists (select 1
                  from table t2
                  where t2.hour = t.hour and
                        t2.hits < t.hits
                 );

This is interpreted as: "Get me all rows from the table where there are no other rows with the same hour and a lower number of hits." This is a round-about way of saying: "Get me the hour with the minimum value." Note that this will return multiple rows when there are ties.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I think if the data was normalised the hack wouldn't be needed, you could select * from websites, then in the column list use a subquery to return the matching smallest hour entry. – scragar Jun 13 '14 at 22:50
  • @scragar . . . (1) The data is normalized. (2) The hack isn't strictly necessary. (3) Feel free to post your own answer. – Gordon Linoff Jun 13 '14 at 23:03