2

I have a table with approx 500,000 rows and I'm testing two composite indexes for it. The first index puts the ORDER BY column last, and the second one is in reverse order.

What I don't understand is why the second index appears to offer better performance by estimating 30 rows to be scanned compared to 889 for the first query, as I was under the impression the second index could not be properly used as the ORDER BY column is not last. Would anyone be able to explain why this is the case? MySQL prefers the first index if both exist.

Note that the second EXPLAIN lists possible_keys as NULL but still lists a chosen key.

1) First index

ALTER TABLE user ADD INDEX test1_idx (city_id, quality);

(cardinality 12942)

EXPLAIN SELECT * FROM user u WHERE u.city_id = 3205 ORDER BY u.quality DESC LIMIT 30;
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+
| id | select_type | table | type   | possible_keys | key       | key_len | ref            | rows | Extra       |
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+
|  1 | SIMPLE      | u     | ref    | test1_idx     | test1_idx | 3       | const          |  889 | Using where | 
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+

2) Second index (same fields in reverse order)

ALTER TABLE user ADD INDEX test2_idx (quality, city_id);

(cardinality 7549)

EXPLAIN SELECT * FROM user u WHERE u.city_id = 3205 ORDER BY u.quality DESC LIMIT 30;
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+
| id | select_type | table | type   | possible_keys | key       | key_len | ref            | rows | Extra       |
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+
|  1 | SIMPLE      | u     | index  | NULL          | test2_idx | 5       | NULL           |  30  | Using where | 
+----+-------------+-------+--------+---------------+-----------+---------+----------------+------+-------------+

UPDATE:

The second query does not perform well in a real-life scenario whereas the first one does, as expected. I would still be curious as to why MySQL EXPLAIN provides such opposite information.

Tom
  • 30,090
  • 27
  • 90
  • 124
  • 1
    https://stackoverflow.com/a/2292952/2129097 should help you understand that in short, in my understanding. both indexes create different groups of data indexed, and for your data, the second index allow for easier find of the group you're looking for (but it may be completely different for other query and the other index will be faster). just by looking at id 3205, you cannot tell wheter index is better for all cases. – Jan Myszkier Aug 02 '18 at 14:57
  • Please provide `SHOW CREATE TABLE user`; there may be subtle things going on, such as `city_id` being `VARCHAR`. Also what percentage of the table has `city_id = 3205`? How fast did each run? – Rick James Aug 21 '18 at 04:16
  • And provide `EXPLAIN FORMAT=JSON SELECT ...` to get more details. – Rick James Aug 21 '18 at 04:21

2 Answers2

1

I guess your data type city_id: MEDIUMINT 3 Bytes quality: SMALLINT 2 Bytes

As I know, For

SELECT * FROM user u WHERE u.city_id = 3205 ORDER BY u.quality DESC LIMIT 30;

The second index(quality, city_id) can not be fully used. Because Order by is Range scan, which can only do for last part of your index.

The first Index looks fit perfect. I guess that some time Mysql is not so smart. maybe the amount of city_id targeted could effect mysql decide which index will be used.

You may try key word

FORCE INDEX(test1_idx)
wa56
  • 331
  • 2
  • 13
1

The rows in EXPLAIN is just an estimate of the number of rows that MySQL believes it must examine to produce the result.

I remembered reading one article from Peter Zaitsev of Percona that this number could be very inaccurate. So you can not simply compare the query efficiency based on this number.

I agree with you that the first index will produce better result in normal scenarios.

You should have noticed that the type column in the first EXPLAIN is ref while index for the second. ref is usually better than a index scan. As you mentioned, if both keys exists, MySQL prefer the first one.

Jacob
  • 1,776
  • 14
  • 11