0

I have the fowlloing query

select * from mapping_channel_fqdn_virtual_host where id in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

explaining the above query gives me the following result:

explain select * from mapping_channel_fqdn_virtual_host where id in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
+----+-------------+-----------------------------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table                             | partitions | type    | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------------------------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | mapping_channel_fqdn_virtual_host | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |   10 |   100.00 | Using where |
+----+-------------+-----------------------------------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0,00 sec)

id is the primary key of the table. It looks like this is a range type of query and it uses the Primary Key as index for this one.

When I try to explain the following query I get different results:

explain select * from mapping_channel_fqdn_virtual_host where id in (select max(id) from mapping_channel_fqdn_virtual_host group by channel_id, fqdn_virtual_host_id);     
+----+-------------+-----------------------------------+------------+-------+------------------+------------------+---------+------+------+----------+-------------+
| id | select_type | table                             | partitions | type  | possible_keys    | key              | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-----------------------------------+------------+-------+------------------+------------------+---------+------+------+----------+-------------+
|  1 | PRIMARY     | mapping_channel_fqdn_virtual_host | NULL       | ALL   | NULL             | NULL             | NULL    | NULL | 4849 |   100.00 | Using where |
|  2 | SUBQUERY    | mapping_channel_fqdn_virtual_host | NULL       | index | idx_channel_fqdn | idx_channel_fqdn | 8       | NULL | 4849 |   100.00 | Using index |
+----+-------------+-----------------------------------+------------+-------+------------------+------------------+---------+------+------+----------+-------------+
2 rows in set, 1 warning (0,00 sec)

idx_channel_fqdn is a composite index key for the column pair used in the groupby clause. But when using subquery the Primary query stops using the index like it did before. Can you explain why this is happening?

Tried the JOIN query Anouar suggested:

explain select *  from mapping_channel_fqdn_virtual_host as x join (select max(id) as ids from mapping_channel_fqdn_virtual_host group by channel_id, fqdn_virtual_host_id) as y on x.id=y.ids; 
+----+-------------+-----------------------------------+------------+--------+------------------+------------------+---------+-------+------+----------+-------------+
| id | select_type | table                             | partitions | type   | possible_keys    | key              | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------------------------------+------------+--------+------------------+------------------+---------+-------+------+----------+-------------+
|  1 | PRIMARY     | <derived2>                        | NULL       | ALL    | NULL             | NULL             | NULL    | NULL  | 4849 |   100.00 | Using where |
|  1 | PRIMARY     | x                                 | NULL       | eq_ref | PRIMARY          | PRIMARY          | 4       | y.ids |    1 |   100.00 | NULL        |
|  2 | DERIVED     | mapping_channel_fqdn_virtual_host | NULL       | index  | idx_channel_fqdn | idx_channel_fqdn | 8       | NULL  | 4849 |   100.00 | Using index |
+----+-------------+-----------------------------------+------------+--------+------------------+------------------+---------+-------+------+----------+-------------+
3 rows in set, 1 warning (0,00 sec)

Judging by the index and eq_ref types it looks like it is better to use the JOIN than the subquery? Can you explain a bit more the outcome of the join explain expressiong?

Apostolos
  • 7,763
  • 17
  • 80
  • 150
  • Will it also choose range scan when you query for `id in (100,101,...,110)`? My wild guess is that the optimizer sees that you are querying the first ten rows, so no need to query the index to find the start entry of the range. And id uses the index to find the maximum value in the second example. – hage Nov 03 '17 at 14:06
  • @Stavr00 the subquery might return more than one values since it is a group by query. – Apostolos Nov 03 '17 at 14:10
  • @hage running id in (100, 101,..., 110) chooses range scan also. – Apostolos Nov 03 '17 at 14:13
  • yeah, then it's as Anouar pointed out in his answer. The optimizer simply decided that asking the index is more expensive than doing the range scan... Don't know if its possible to see estimated costs in MySQL,though. – hage Nov 03 '17 at 14:15
  • Possible duplicate of [MySQL not using indexes with WHERE IN clause?](https://stackoverflow.com/questions/586381/mysql-not-using-indexes-with-where-in-clause) – Stavr00 Nov 03 '17 at 14:20
  • Bottom line: `IN ( SELECT ... )` is notoriously bad for performance. Switch to `JOIN` (in this case) or `EXISTS` (in _some_ other cases). – Rick James Jan 20 '18 at 14:40

1 Answers1

1

You can see the answers for this question You will fine ideas.

I am quoting from some answers

Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.)

Briefly, Try forcing the index:

SELECT *
FROM mapping_channel_fqdn_virtual_host FORCE INDEX (name of the index you want to use)
WHERE (mapping_channel_fqdn_virtual_host.id IN (1,2,3,4,5,6,7,8,9,10));

Or use JOIN instead and see the explain

SELECT * FROM mapping_channel_fqdn_virtual_host mcf
JOIN (select max(id) as ids from mapping_channel_fqdn_virtual_host group by channel_id, fqdn_virtual_host_id)) AS mcfv 
ON mcf.id = mcfv.ids;
AnouarZ
  • 1,087
  • 8
  • 23
  • Using the `JOIN` is likely to be faster because the Optimizer will start with the 'derived' table (the subquery). This will boil down to a small number of rows. Then only that small number will be used to reach into `mcf`. The `EXPLAIN` does not have a clue of how many rows, so that column can/should be ignored. – Rick James Jan 20 '18 at 14:37