0

I'm reasonable sure the answer to this lies in having a different index. I have a query that's unreasonably slow, but only when it's in the following complete form, if I remove parts of the query it's blazing fast, how can I make it better?

Slow:

SELECT json
  FROM requests
  WHERE spider = 'foo'
    AND load_count = ( SELECT MIN( load_count ) FROM requests )
    AND load_count < 50
  LIMIT 500;

EXPLAIN:

+----+-------------+----------+------+-------------------------+--------------+---------+-------+--------+------------------------------+
| id | select_type | table    | type | possible_keys           | key          | key_len | ref   | rows   | Extra                        |
+----+-------------+----------+------+-------------------------+--------------+---------+-------+--------+------------------------------+
|  1 | PRIMARY     | requests | ref  | load_count,spider_index | spider_index | 90      | const | 200845 | Using where                  |
|  2 | SUBQUERY    | NULL     | NULL | NULL                    | NULL         | NULL    | NULL  |   NULL | Select tables optimized away |
+----+-------------+----------+------+-------------------------+--------------+---------+-------+--------+------------------------------+

Database structure:

CREATE TABLE `requests` (
  `added` int(11) NOT NULL AUTO_INCREMENT,
  `url` char(255) NOT NULL,
  `spider` char(30) NOT NULL,
  `referer` char(255) DEFAULT NULL,
  `json` text NOT NULL,
  `load_count` int(11) NOT NULL DEFAULT '0',
  `processed` tinyint(1) NOT NULL DEFAULT '0',
  `invalid` tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`added`),
  UNIQUE KEY `url` (`url`),
  KEY `load_count` (`load_count`),
  KEY `spider_index` (`spider`)
) ENGINE=MyISAM AUTO_INCREMENT=5285840 DEFAULT CHARSET=utf8

After updating my index like Neo suggested I get drastic improvements:

+----+-------------+----------+------+-------------------+-------------------+---------+-------------+------+------------------------------+
| id | select_type | table    | type | possible_keys     | key               | key_len | ref         | rows | Extra                        |
+----+-------------+----------+------+-------------------+-------------------+---------+-------------+------+------------------------------+
|  1 | PRIMARY     | requests | ref  | spider_load_count | spider_load_count | 94      | const,const | 1487 | Using where                  |
|  2 | SUBQUERY    | NULL     | NULL | NULL              | NULL              | NULL    | NULL        | NULL | Select tables optimized away |
+----+-------------+----------+------+-------------------+-------------------+---------+-------------+------+------------------------------+
Kit Sunde
  • 35,972
  • 25
  • 125
  • 179
  • Check what happens when you add EXPLAIN to your SELECT statement, it will tell you what MySQL does. – Michael J.V. Apr 07 '11 at 08:52
  • Have you examined the query plan? – Stephen Chung Apr 07 '11 at 08:52
  • Adding such a simple constraint should not make a query slower... – Stephen Chung Apr 07 '11 at 08:53
  • It sounds like you run the query (slow) and then you retrieve it from cache. Using EXPLAIN SELECT should tell you which fields mysql wants to use or uses for it's query planning. – Michael J.V. Apr 07 '11 at 08:57
  • Added EXPLAINS and you are right I must've been retrieving from cache on the "fast" ones. – Kit Sunde Apr 07 '11 at 10:17
  • 1
    You can test queries with SELECT SQL_NO_CACHE .... (will not do anything if query already cached, you can also flush caches). EXPLAIN EXTENDED is a friend and for your purpose you might find profiling useful - http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html – Unreason Apr 07 '11 at 10:29
  • For explanation of the explain check the http://stackoverflow.com/questions/1992312/meaning-of-select-tables-optimized-away-in-mysql-explain-plan, your query should be fast - profiler can be useful here to figure out what's going on. – Unreason Apr 07 '11 at 10:33
  • @Unreason - I've never seen SQL_NO_CACHE before, that'll come handy. Thanks. – Kit Sunde Apr 07 '11 at 11:24
  • @Kit Sunde, use with care - that is for pure query cache, mysql will still cache some things (I think mainly stuff related to indexes). Also, benchmarks of cached results on parts of the database that are rarely updated can be more important for performance in production. – Unreason Apr 07 '11 at 12:42

3 Answers3

1
alter table requests drop index load_count;
alter table requests drop index spider_index;

alter table requests add index spider_load_count(load_count, spider);
Kit Sunde
  • 35,972
  • 25
  • 125
  • 179
neocanable
  • 5,293
  • 2
  • 23
  • 28
0

What about this?

SELECT MIN(load_count) INTO @min_load_count FROM requests;

SELECT json
  FROM requests
  WHERE load_count = @min_load_count
    AND load_count < 50
  LIMIT 500;

And having index on spider field may help you.

Devart
  • 119,203
  • 23
  • 166
  • 186
  • The query didn't improve anything as it seems MySQL was already optimizing away the inner query, and I have an index on the spider as you can see in the CREATE TABLE in my question. – Kit Sunde Apr 07 '11 at 10:57
0

A few comments/suggestions:

  • Have you tried using the MySQL Explain Statement on your Slow SELECT statement? This probably will give you some indication of the problem.
  • I suspect the issue with the slow query is that it has both spider and load_count in the WHERE clause, but no index that covers both fields. Adding an index with both in will probably fix this example.
  • The first two queries have "AND load_count < 50" in the WHERE, which is not needed as you also have a "load_count = [exact value]". MySQL will ignore the "AND load_count < 50" in it's query optimisation.
pkt1975
  • 98
  • 4
  • 1) Added explains. 2) I'm attempting this now, as per Neos suggestion. 3) If the MIN count is higher than 50 I don't want the row. So it's necessary. – Kit Sunde Apr 07 '11 at 11:00