2

I'm trying to optimize the query below

SELECT `publisher_id`, `publisher_name`, SUM(`views`) AS `total_views`, SUM(`channels`) AS `total_channels`
FROM (
    SELECT DISTINCT `name` AS `publisher_name`, `id` AS `publisher_id` 
    FROM `publishers` 
    WHERE TRIM(`name`) <> ''
     ) AS `publisher_names`
    INNER JOIN
    (
    SELECT `twitch_name`, `publishers` 
    FROM `game_profiles` 
    WHERE `twitch_name` IS NOT NULL 
        AND `publishers` IS NOT NULL 
        AND TRIM(`publishers`) <> ''
    ) AS `game_list`
    ON `game_list`.`publishers` LIKE CONCAT('%', `publisher_names`.`publisher_name`, '%')
    INNER JOIN
    (
    SELECT `games`.`id` AS `id`, `games`.`name`, `games`.`simple_name`, `games`.`box`, SUM(`channels`) AS `channels`, SUM(`viewers`) AS `views`
    FROM `games`
    WHERE `log_date` >= SUBDATE(NOW(), INTERVAL 1 WEEK) 
        AND `log_date` <= SUBDATE(NOW(), INTERVAL 0 WEEK)
    GROUP BY `games`.`id`
    ) AS `view_list`
    ON `game_list`.`twitch_name` = `view_list`.`name`
GROUP BY `publisher_id` ORDER BY `total_views` DESC LIMIT 10;

Checking the performance of the query with the EXPLAIN command, I get the following results.

EXPLAIN result for explicit JOIN

Basically the games table contains the number of views and the channels in an hourly basis, the game_profiles table maps the game to its publiisher(s), and the publishers table contains the more detailed row of each existing publisher. What I'm trying to achieve is display the top 10 publishers based on the total views of its games from the past week.

Running out of ideas, I tried using implicit JOIN. The query is written below

SELECT `publishers`.`id` AS `publisher_id`, `publishers`.`name` AS `publisher_name`, 
SUM(`games`.`viewers`) AS `total_views`, SUM(`games`.`channels`) AS `total_channels` 
FROM `game_profiles`, `publishers`, `games`
WHERE `game_profiles`.`twitch_name` IS NOT NULL 
    AND `game_profiles`.`publishers` IS NOT NULL AND TRIM(`game_profiles`.`publishers`) <> ''
    AND `game_profiles`.`publishers` LIKE CONCAT('%', `publishers`.`name`, '%') 
    AND `game_profiles`.`twitch_name` = `games`.`name`
    AND `games`.`log_date` >= SUBDATE(NOW(), INTERVAL 1 WEEK) 
    AND `games`.`log_date` <= SUBDATE(NOW(), INTERVAL 0 WEEK)
GROUP BY `publisher_id` ORDER BY `total_views` DESC LIMIT 10;

Which gives me the following result for the EXPLAIN command.

EXPLAIN result for implicit JOIN

From what I understand, this should be returning the same result, but the query is running slow in MySQL workbench that I can't wait for its result so I can't verify that it actually returns the same rows. Just judging from the EXPLAIN result however, I think the latter query should run faster. Is there something I'm missing here on why this is not the case? Thank you very much.

P.S. My database design is not really optimal. This is more of a prototype database. No normalization was performed when this was made. I just want to understand better what's happening in my queries. Thank you.

iron59
  • 135
  • 10
  • 2
    you have written sql code in very untidy way(not in proper format) ,so that's not easy to understand easily ... please follow proper formatting standards and then update your question – Ankit Agrawal Apr 04 '17 at 06:38
  • Thanks for the edit @AnkitAgrawal – iron59 Apr 04 '17 at 07:24

1 Answers1

2

In the second query you are doing implicit CROSS JOIN which is not desirable and causes your query to run forever. It means you select all the lines from all the tables first and then after that operation you filter the resultset.

As for the first query.

Your database design is not very good.

The clause game_list.publishers LIKE CONCAT('%', publisher_names.publisher_name, '%' is far from optimal. There should be rather a linking table.

So probably indexing is poor too, check for missing indexes especially on games table, column log_date.

WHERE log_date >= SUBDATE(NOW(), INTERVAL 1 WEEK) 
  AND log_date <= SUBDATE(NOW(), INTERVAL 0 WEEK)

Btw can be rewitten using BETWEEN for better readability:

WHERE log_date BETWEEN SUBDATE(NOW(), INTERVAL 1 WEEK) 
                   AND SUBDATE(NOW(), INTERVAL 0 WEEK)

LTRIM(publishers) <> '' is not sargable, try to avoid that. publishers <> '' should be sufficient.

The grouping of table games in the last INNER JOIN is probably also not optimal. For such a question it is allways better to provide SQL Fiddle with sample data.

But you are doing always one mistake in all your subqueries. You use INNER JOIN (SELECT x WHERE y) as Z ON z.something = a.something. That kills indexing performance.

Optimized query would look something like that (unverified):

SELECT 
    publisher_names.id AS publisher_id
    ,publisher_names.name AS publisher_name
    ,SUM(view_list.views) AS total_views
    ,SUM(view_list.channels) AS total_channels
FROM publishers AS publisher_names
INNER JOIN game_profiles AS game_list ON 
    twitch_name IS NOT NULL
    AND publishers IS NOT NULL
    AND publishers <> ''
    AND publishers LIKE CONCAT('%', publisher_names.publisher_name, '%')
INNER JOIN  games AS view_list 
         ON log_date BETWEEN SUBDATE(NOW(), INTERVAL 1 WEEK) 
                     AND SUBDATE(NOW(), INTERVAL 0 WEEK)        
            AND game_list.twitch_name = view_list.name
WHERE publisher_names.name <> ''
GROUP BY publisher_id
ORDER BY total_views DESC
Community
  • 1
  • 1
Vojtěch Dohnal
  • 7,867
  • 3
  • 43
  • 105
  • Yeah, my database design is not really optimal. This is more of a prototype database. No normalization was performed when this was made. I just want to understand better what's happening in my queries. Thank you for your response. I'm looking at the information that you shared with your answer now. – iron59 Apr 04 '17 at 07:19
  • I didn't know that what I'm doing is an implicit CROSS JOIN. I thought it was an implicit INNER JOIN. Thank you for your response. – iron59 Apr 04 '17 at 07:25