3

Having some real issues with a few queries, this one inparticular. Info below.

tgmp_games, about 20k rows

CREATE TABLE IF NOT EXISTS `tgmp_games` (
  `g_id` int(8) NOT NULL AUTO_INCREMENT,
  `site_id` int(6) NOT NULL,
  `g_name` varchar(255) NOT NULL,
  `g_link` varchar(255) NOT NULL,
  `g_url` varchar(255) NOT NULL,
  `g_platforms` varchar(128) NOT NULL,
  `g_added` datetime NOT NULL,
  `g_cover` varchar(255) NOT NULL,
  `g_impressions` int(8) NOT NULL,
  PRIMARY KEY (`g_id`),
  KEY `g_platforms` (`g_platforms`),
  KEY `site_id` (`site_id`),
  KEY `g_link` (`g_link`),
  KEY `g_release` (`g_release`),
  KEY `g_genre` (`g_genre`),
  KEY `g_name` (`g_name`),
  KEY `g_impressions` (`g_impressions`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1;

tgmp_reviews - about 200k rows

CREATE TABLE IF NOT EXISTS `tgmp_reviews` (
  `r_id` int(8) NOT NULL AUTO_INCREMENT,
  `site_id` int(6) NOT NULL,
  `r_source` varchar(128) NOT NULL,
  `r_date` date NOT NULL,
  `r_score` int(3) NOT NULL,
  `r_copy` text NOT NULL,
  `r_link` text NOT NULL,
  `r_int_link` text NOT NULL,
  `r_parent` int(8) NOT NULL,
  `r_platform` varchar(12) NOT NULL,
  `r_impressions` int(8) NOT NULL,
  PRIMARY KEY (`r_id`),
  KEY `site_id` (`site_id`),
  KEY `r_parent` (`r_parent`),
  KEY `r_platform` (`r_platform`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 ;

Here is the query, takes 3 seconds ish

SELECT * FROM tgmp_games g
RIGHT JOIN tgmp_reviews r ON g_id = r.r_parent
WHERE g.site_id = '34'
GROUP BY g_name
ORDER BY g_impressions DESC LIMIT 15

EXPLAIN

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  r   ALL     r_parent    NULL    NULL    NULL    201133  Using temporary; Using filesort
1   SIMPLE  g   eq_ref  PRIMARY,site_id     PRIMARY     4   engine_comp.r.r_parent  1   Using where

I am just trying to grab the 15 most viewed games, then grab a single review (doesnt really matter which, I guess highest rated would be ideal, r_score) for each game.

Can someone help me figure out why this is so horribly inefficient?

Horse
  • 3,023
  • 5
  • 38
  • 65
  • The `WHERE g.site_id = '34'` condition makes the query equivalent to an `INNER` join. – ypercubeᵀᴹ Aug 09 '12 at 18:23
  • @ypercube can you explain why? – Horse Aug 10 '12 at 12:07
  • Outer joins selects all joined data, plus (in the case of Right join) all data from the right side, `r` here, that do not match any rows of the left side, `g` here. Then these additional rows are rejected here, because `g.site_id` is Null (and of course not equal to 34). Remember, these are the rows that had no matching in `g` table. – ypercubeᵀᴹ Aug 10 '12 at 13:13
  • And anyway, `RIGHT JOIN` doesn't make sense here. I assume you'll never have a review that is not refering to a game. Are you? – ypercubeᵀᴹ Aug 10 '12 at 13:15

2 Answers2

2
  1. I don't understand what is the purpose of having a GROUP BY g_name in your query, but this makes MySQL performing aggregates on the columns selected, or all columns from both table. So please try to exclude it and check if it helps.

  2. Also, RIGHT JOIN makes database to query tgmp_reviews first, which is not what you want. I suppose LEFT JOIN is a better choice here. Please, try to change the join type.

  3. If none of the first options helps, you need to redesign your query. As you need to obtain 15 most viewed games for the site, the query will be:

    SELECT g_id
      FROM tgmp_games g
     WHERE site_id = 34
     ORDER BY g_impressions DESC
     LIMIT 15;
    

    This is the very first part that should be executed by the database, as it provides the best selectivity. Then you can get the desired reviews for the games:

    SELECT r_parent, max(r_score)
      FROM tgmp_reviews r
     WHERE r_parent IN (/*1st query*/)
     GROUP BY r_parent;
    

    Such construct will force database to execute the first query first (sorry for the tautology) and will give you the maximal score for each of the wanted games. I hope you will be able to use the obtained results for your purpose.

vyegorov
  • 21,787
  • 7
  • 59
  • 73
  • Thanks for the answer :) 1. GROUP BY is because there are multiple reviews per game, and I only want 1 review per game. 2. LEFT JOIN could pick up instances of games without reviews, which I dont want, and adding AND r_score != '' doesnt seem to work. 3. Originally I had FROM as reviews and joined the games but that was just as bad. I will give the IN solution a try, however wont IN running on such a huge table also be fairly inefficient? Will give it a try now, thanks! – Horse Aug 10 '12 at 09:25
  • I needed more than just the g_id from games, and also my version of mysql doesnt allow limit within sub queries. So had to do it as 2 separate queries and use PHP to build the IN part of the second query. Also I needed RIGHT JOIN to only return games with reviews, but again have got around that by raising the limit of the games query, and using PHP to limit it and discard the results I dont need. Thanks for the help, both queries combined running at < 0.02 seconds, much better :) – Horse Aug 10 '12 at 09:57
  • 1
    @Horse, if you need games with reviews, use `INNER JOIN`, it fits better and will execute faster. Also, instead of using PHP to combine results of 2 queries think of some other solution, like using views or stored procedures. The fact that your MySQL doesn't support `LIMIT` in the sub-queries seems very limiting :) – vyegorov Aug 10 '12 at 10:27
1

Your MyISAM table is small, you can try converting it to see if that resolves the issue. Do you have a reason for using MyISAM instead of InnoDB for that table?

You can also try running an analyze on each table to update the statistics to see if the optimizer chooses something different.

Kadaan
  • 1,154
  • 1
  • 8
  • 13