GROUP BY slow query with dates

Question

I have a bit of an oddity on a site I'm working on

This query takes six seconds to run

SELECT `z_tv_episodes`.*, `title`.*, `z_networks`.*
FROM `z_tv_episodes`, `title`, `z_networks_title`, `z_networks`
WHERE `season_number` = 1
AND `episode_number` = 1
AND `episode_airdate` < "2017-01-19"
AND `z_tv_episodes`.`media_id` = `title`.`media_id`
AND `has_poster` = 1
AND `title`.`media_id` = `z_networks_title`.`media_id`
AND `z_networks_title`.`network_id` = `z_networks`.`network_id`
GROUP BY `z_tv_episodes`.`media_id`
ORDER BY `episode_airdate` DESC
 LIMIT 6

I've realised that the GROUP BY is the reason it is so slow. Adding a multi column index to episode_airdate and media_id didn't help. Distinct makes it good but then I don't have the columns I need. Any help would be great.

episode_airdate < "2017-01-19" matches roughly 300k rows.

See http://meta.stackoverflow.com/questions/333952/why-should-i-provide-an-mcve-for-what-seems-to-me-to-be-a-very-simple-sql-query - and we don't need 300,000 rows to illustrate the problem. 12 will usually suffice. — Strawberry, Jan 19 '17 at 23:00

score 2 · Accepted Answer · edited May 23 '17 at 11:52

Apart from the performance problem, your query may have a semantic problem, as discussed below.

Having a GROUP BY but without having an aggregation function usually spells troubles. How many rows are there in each group? It can either be "always one" or "sometimes more than one"

If there are always one row in every group, then you don't even need to a GROUP BY to begin with
If there may be more than one rows for at least one group, then for that group(s), what should be value the columns that now have multiple values? (If you have an aggregation function such as sum() or group_concat() then we know what the value would be. Without such function the answer is ambiguous.)

Please see this SO article for the problem:

Why does MySQL allow "group by" queries WITHOUT aggregate functions?

In your particular case, I suggest you first check if your situation is #1 or #2 above, by adding one line to your query. And then we can take it from there.

SELECT 
    `z_tv_episodes`.*
    , `title`.*
    , `z_networks`.*
    , count(*) -- <----------- add this line
FROM 
    `z_tv_episodes`
    , `title`
    , `z_networks_title`
    , `z_networks`
WHERE 
    `season_number` = 1
    AND `episode_number` = 1
    AND `episode_airdate` < "2017-01-19"
    AND `z_tv_episodes`.`media_id` = `title`.`media_id`
    AND `has_poster` = 1
    AND `title`.`media_id` = `z_networks_title`.`media_id`
    AND `z_networks_title`.`network_id` = `z_networks`.`network_id`
GROUP BY `z_tv_episodes`.`media_id`
ORDER BY `episode_airdate` DESC
LIMIT 6

Hi @leeyuiwah I managed to fix this by adding a minimum episode_airdate to reduce the number of rows it matched on the date parameter. count(*) returned 1 but you were right that the GROUP BY isn't needed at all. I'm not sure how I could miss this. Thanks for the help — Callombert, Jan 20 '17 at 08:03

score 0 · Answer 2 · answered Jan 20 '17 at 08:06

I'll post an answer just in case somebody else ends up here. What I did is add another date param so it reads like so:

AND `episode_airdate` < "2017-01-19"
AND `episode_airdate` > "2016-12-19"

This drastically reduced the number of rows MYSQL ended up having to GROUP on.

GROUP BY slow query with dates

2 Answers2