why MySQL stopped `using index for group by` though it's less efficient that doing it

Question

(the table is used by legacy apps so we can't change the schema itself)

We had a table courses_old with around 1.6 million rows

mysql> describe courses_old ;
+---------------+------------------+------+-----+---------+-------+
| Field         | Type             | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+-------+
| chapter       | varchar(128)     | NO   | PRI |         |       |
| label         | varchar(128)     | NO   | PRI |         |       |
| title         | varchar(128)     | YES  |     | NULL    |       |
| language      | varchar(48)      | NO   | PRI | en      |       |
+---------------+------------------+------+-----+---------+-------+

We often do the request

SELECT DISTINCT
    chapter,
    title,
    language
FROM courses_old;

it was quite slow, so we added an index on these 3 fields and it became faster, and the explain was like this:

+----+-------------+-------------+-------+----------------+---------------+---------+------+-------+--------------------------+
| id | select_type | table       | type  | possible_keys  | key           | key_len | ref  | rows  | Extra                    |    
+----+-------------+-------------+-------+----------------+---------------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | courses_old | range | idx_c_t_l_old  | idx_c_t_l_old | 919     | NULL | 65072 | Using index for group-by |
+----+-------------+-------------+-------+----------------+---------------+---------+------+-------+--------------------------+

We discovered there was a lot (50% !) of unused values, so we did the following

renaming the table in courses_old
creating again the table named this time courses_new
doing a insert into courses_new select * from courses_old where ...
recreating the same index

The number of rows is now 840000 And now the explain look like this

SELECT DISTINCT
    chapter,
    title,
    language
FROM courses_new;


+----+-------------+-------------+-------+----------------+---------------+---------+------+-------+-------------+
| id | select_type | table       | type  | possible_keys  | key           | key_len | ref  | rows  | Extra       |
+----+-------------+-------------+-------+----------------+---------------+---------+------+-------+-------------+
|  1 | SIMPLE      | courses_new | index | idx_c_t_l_new  | idx_c_t_l_new | 919     | NULL | 809549| Using index |
+----+-------------+-------------+-------+----------------+---------------+---------+------+-------+-------------+

And the requests has gone from 0.2s to 1s

However if now i do

SELECT DISTINCT
    chapter,
    title,
    language
FROM courses_new
FORCE INDEX (idx_c_t_l);

the performance are back to normal, as well as the explain

    +----+-------------+-------------+-------+----------------+---------------+---------+------+-------+--------------------------+
    | id | select_type | table       | type  | possible_keys  | key           | key_len | ref  | rows  | Extra                    |
    +----+-------------+-------------+-------+----------------+---------------+---------+------+-------+--------------------------+
    |  1 | SIMPLE      | courses_new | range | idx_c_t_l_new  | idx_c_t_l_new | 919     | NULL | 101194| Using index for group-by |
    +----+-------------+-------------+-------+----------------+---------------+---------+------+-------+--------------------------+

so my questions are

why MySQL has suddenly decided to stop doing the same explain , though it's less efficient ?
is there a way (by playing with some parameters etc.) to make MySQL come back to reason and find alone the most efficient query plan ?

Edit: some precisions:

I've kept the two versions of the tables, so they're running exactly under the same condition
On the new table, there's less rows than on the old one
I've benchmarked each requests 1000 time each to be sure there were no cache causing the perf difference

It may just be hitting a memory limit since your data has grown. [Check out this answer](http://stackoverflow.com/questions/12104185/how-to-set-memory-limit-in-my-cnf-file) — JNevill, Jun 23 '16 at 16:23
Actually we've gone from 1.6 million row to 800 thousand , so we've divided by two the data set, and It's not an overall growth of the database because as we've kept the old table, we've been able to run the tests on the very same condition — allan.simon, Jun 23 '16 at 18:47
Your second query seems to be using the same key (if that is the correct explain, if not, please add the correct explain, and please add the explain when using `force`). Maybe it was just faster because your index is buffered the seconds time you run it (try to run the first query without `force` after the one with `force`). Your index is really really big (bigger than your table), so it might take some time to load it from hdd. — Solarflare, Jun 23 '16 at 20:04
@Solarflare , actually it's the same index name but on two different tables (the _old one and the new one). We've run the query in a script a thousand time each to benchmark, to avoid as you said having index buffered/not buffered. Our DB machines are amazon RDS with 16go of RAM and SSD. I'm going to add the explain for the `force` version, thanks for your help :) — allan.simon, Jun 23 '16 at 20:11
@Solarflare I've added the explain of the `force` version, and I've renamed the table and indexes so that it's clearer — allan.simon, Jun 23 '16 at 20:33
Well, `force` (and some other hints like this) is the right way to tell mysql to take another execution plan than he thinks, so you are using the right way, and it seems to work as expected. Do not change your server settings to optimize one query (unless you just upgraded your mysql version). I'm not really sure why it took your plan there. Maybe try `optimize table courses_new` to reset statistics. — Solarflare, Jun 23 '16 at 23:15

why MySQL stopped `using index for group by` though it's less efficient that doing it

0 Answers0