3

I'm trying to find a way to speed up a slow (filesort) MySQL query.

Tables:

categories (id, lft, rgt)
questions (id, category_id, created_at, votes_up, votes_down)

Example query:

SELECT * FROM questions q 
INNER JOIN categories c ON (c.id = q.category_id)
WHERE c.lft > 1 AND c.rgt < 100
ORDER BY q.created_at DESC, q.votes_up DESC, q.votes_down ASC
LIMIT 4000, 20

If I remove the ORDER BY clause, it's fast. I know MySQL doesn't like both DESC and ASC orders in the same clause, so I tried adding a composite (created_at, votes_up) index to the questions table and removed q.votes_down ASC from the ORDER BY clause. That didn't help and it seems that the WHERE clause gets in the way here because it filters by columns from another (categories) table. However, even if it worked, it wouldn't be quite right since I do need the q.votes_down ASC condition.

What are good strategies to improve performance in this case? I'd rather avoid restructuring the tables, if possible.

EDIT:

CREATE TABLE `categories` (
  `id` int(11) NOT NULL auto_increment,
  `lft` int(11) NOT NULL,
  `rgt` int(11) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `lft_idx` (`lft`),
  KEY `rgt_idx` (`rgt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `questions` (
  `id` int(11) NOT NULL auto_increment,
  `category_id` int(11) NOT NULL,
  `votes_up` int(11) NOT NULL default '0',
  `votes_down` int(11) NOT NULL default '0',
  `created_at` datetime NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `questions_FI_1` (`category_id`),
  KEY `votes_up_idx` (`votes_up`),
  KEY `votes_down_idx` (`votes_down`),
  KEY `created_at_idx` (`created_at`),
  CONSTRAINT `questions_FK_1` FOREIGN KEY (`category_id`) REFERENCES `categories` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

id select_type table type   possible_keys           key     key_len ref                 rows  Extra
1  SIMPLE      q     ALL    questions_FI_1          NULL    NULL    NULL                31774 Using filesort
1  SIMPLE      c     eq_ref PRIMARY,lft_idx,rgt_idx PRIMARY 4       ttt.q.category_id 1       Using where
Ree
  • 6,061
  • 11
  • 48
  • 50
  • Have you indexed the columns at all? – BenOfTheNorth Feb 26 '12 at 12:04
  • 1
    You didn't post the output of `SHOW CREATE TABLE` for 2 tables nor the output of `EXPLAIN`. You also have `LIMIT 4000, 20`. If you need help profiling queries, please post the output of the mentioned queries. – N.B. Feb 26 '12 at 12:06
  • @Ben Griffiths: All columns have indexes. – Ree Feb 26 '12 at 12:07
  • @N.B.: I don't think it matters much in this case, but I have edited my question. – Ree Feb 26 '12 at 12:40
  • It matters. Check your explain output - 31774 rows examined for your query. That indicates you have no limiting factor for the `questions` table. You have all columns indexed - that's bad. Have you checked index cardinality at all? Also, limit 4000,20 will force mysql to discard 4000 rows before returning 20 you want. – N.B. Feb 26 '12 at 13:16
  • Have you tried potting `c.lft > 1 AND c.rgt < 100` conditions inside `ON ()` ? – Slava Feb 26 '12 at 13:20
  • @N.B. I think that's normal pagination LIMIT. Do you have any better suggestion? – Slava Feb 26 '12 at 13:22
  • @Slava I know what it is, and I also know what MySQL does with it. Google a bit to see what's going on behind the scenes with `LIMIT OFFSET` clause. – N.B. Feb 26 '12 at 13:43
  • @Slava: This seems to have solved the problem (filesort is still there, but the speed has increased 10 fold). I wonder why this is the case? – Ree Feb 26 '12 at 13:45
  • @Ree Ok, I'll add it as an answer then. And for the question "why"... I really have no idea. That's another of MySQL's mysteries to me like treating a subquery that uses no external data as a DEPENDENT SUBQUERY and running it X times... Sometimes. And sometimes only once. Google might know but I haven't found anything. – Slava Feb 26 '12 at 17:28
  • @N.B. Ok, I presume it does something bad with it. But the question is: do you know any better way to paginate results? If you do, please share that knowledge with us. Um-m, actually just found one. Ree, you may want to have a look at this too: http://stackoverflow.com/q/1243952/598472 – Slava Feb 26 '12 at 17:45
  • @Slave - with `LIMIT 4000, 20`, MySQL will fetch 4020 results and then discard 4000 of them. It means that all 4020 records will have to be computed, pulled from the HDD, stored in RAM and then majority is discarded. There are articles how to design better pagination, or get around this LIMIT OFFSET limitation, mentioned on this site a few times. Google for "late row lookups" to get some relevant results. – N.B. Feb 26 '12 at 18:35

3 Answers3

1

Try a subquery to get the desired categories:

SELECT * FROM questions 
WHERE category_id IN ( SELECT id FROM categories WHERE lft > 1 AND rgt < 100 )
ORDER BY created_at DESC, votes_up DESC, votes_down ASC
LIMIT 4000, 20
dgw
  • 13,418
  • 11
  • 56
  • 54
0

Try selecting only what you need in your query, instead of the SELECT *

Why not to use SELECT * ( ALL ) in MySQL

conrad10781
  • 2,223
  • 19
  • 17
  • 1
    This has nothing to do with my question. SELECT * is here to make the example shorter (I don't actually use it). – Ree Feb 26 '12 at 12:50
  • I'm not sure how that's not the case. Being able to use an INDEX makes things faster. Having a SELECT * in your statement is making that impossible. Your EXPLAIN is showing just that. – conrad10781 Feb 26 '12 at 12:54
  • The columns selected do not have any influence on indexing at all. – Ree Feb 26 '12 at 13:40
  • 1
    Columns selected **do** affect indexing or rather do affect the performance gained from indexes; because if you only select indexed columns then it can actually operate entirely on indexes and not actually retrieve the table row. – Seph Feb 27 '12 at 06:10
0

Try putting conditions, concerning joined tables into ON clauses:

SELECT * FROM questions q 
INNER JOIN categories c ON (c.id = q.category_id AND c.lft > 1 AND c.rgt < 100)
ORDER BY q.created_at DESC, q.votes_up DESC, q.votes_down ASC
LIMIT 4000, 20
Slava
  • 2,040
  • 15
  • 15
  • I really don't have any idea on why this should help but it may have something to do with WHERE being the place to put conditions on all tables. Perhaps it cannot use joined table's indexes in that mode. Or maybe not. I can only dream up reasons here. :) Would be glad if somebody could share that knowledge. – Slava Feb 26 '12 at 17:37
  • Ree, you may want to have a look at this too for another way to paginate results: http://stackoverflow.com/q/1243952/598472 . However, looks tricky to do with user-defined ordering on any columns. You should totally check if it is worth all the effort to maintain additional layer before implementing it. :) – Slava Feb 26 '12 at 17:52