3

Here is the table definition

CREATE TABLE `dt_prdtime` (
  `TCompany` varchar(3) NOT NULL DEFAULT '',
  `TPerCode` varchar(8) NOT NULL,
  `TBegDateTime` datetime NOT NULL DEFAULT '0000-00-00 00:00:00' COMMENT 'วันที่',
  `TQPay` int(1) NOT NULL DEFAULT '2',
  `TYear` int(4) NOT NULL,
  `TMonth` int(2) NOT NULL,
  PRIMARY KEY (`TCompany`,`TPerCode`,`TBegDateTime`),
  KEY `TMonth` (`TMonth`) USING BTREE,
  KEY `TPerCode` (`TPerCode`,`TYear`,`TMonth`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

And this is data sample. This table has 10000+ records and value in TMonth field varies

+----------+----------+---------------------+-------+-------+--------+
| TCompany | TPerCode | TBegDateTime        | TQPay | TYear | TMonth |
+----------+----------+---------------------+-------+-------+--------+
| S10      | 000001   | 2016-01-02 17:33:00 |     1 |  2016 |      1 |
| S10      | 000001   | 2016-01-02 07:48:00 |     1 |  2016 |      1 |
| S10      | 000001   | 2016-01-03 17:39:00 |     1 |  2016 |      1 |
| S10      | 000001   | 2016-01-03 07:30:00 |     1 |  2016 |      1 |
| S10      | 000001   | 2016-01-04 17:49:00 |     1 |  2016 |      1 |
| S10      | 000001   | 2016-01-04 07:54:00 |     1 |  2016 |      1 |
| S10      | 000001   | 2016-01-05 17:50:00 |     1 |  2016 |      1 |
| S10      | 000001   | 2016-01-05 07:36:00 |     1 |  2016 |      1 |
| S10      | 000001   | 2016-01-06 17:37:00 |     1 |  2016 |      1 |
| S10      | 000001   | 2016-01-06 07:35:00 |     1 |  2016 |      1 |
+----------+----------+---------------------+-------+-------+--------+

With EXPLAIN, This query uses TMonth index:

SELECT * FROM dt_prdtime WHERE TMonth = 5

while this one refuses to use the index:

SELECT * FROM dt_prdtime WHERE TMonth IN (5,6)

I tested with another simple table,

CREATE TABLE `table1` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

SELECT * FROM table2 WHERE id IN (5,6)

and the index for this table was used

Can anybody explain this? Is there something wrong with dt_prdtime table?

Red Romanov
  • 454
  • 5
  • 11
  • This might be due to the optimizer. Sometimes optimizer estimates not use an index, even if one is available. Have you tried FORCE index? – Malinga Jun 30 '16 at 04:15
  • SELECT * FROM dt_prdtime FORCE INDEX (`TMonth`) WHERE TMonth IN (5, 6) – Malinga Jun 30 '16 at 04:24
  • 1
    You shouldn't have to force an index. That is a good time to pack your bags and not use that rdbms – Drew Jun 30 '16 at 04:25
  • agree, because we have to believe optimizer has a point if not using index – Malinga Jun 30 '16 at 04:27
  • Are you deliberately ignoring the year when asking about May? If not, there is a better way to do your task, even for (5,6). – Rick James Jul 02 '16 at 02:43

2 Answers2

2

I will go out on a limb and say it is because you are using the MyISAM engine.

It is working perfectly fine with INNODB as can be seen in this Answer of mine.

I will try to spook up at least 1 honorable reference on the matter.

Here, The range Join Type, clearly an INNODB focus as it is the default engine. And when not explicitly mentioned in the manual in some documentation hierarchy, it is assumed.

Note, there is nothing contiguous about the id's in my example link. Meaning, don't hyperfocus on type=range in its EXPLAIN output. The speed is arrived at via the Optimizer (the CBO).

The cardinality in my example is very high (4.3 Million). The target id counts are relatively low (1000). The index is used.

Your situation may be the opposite: your cardinality might be incredibly low, like 3, and the optimizer decides to abandon use of the index.

To check your index cardinality, see the Manual Page SHOW INDEX Syntax.

A simple call such as:

show index from ratings;

+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ratings |          0 | PRIMARY  |            1 | id          | A         |     4313544 |     NULL | NULL   |      | BTREE      |         |               |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Community
  • 1
  • 1
Drew
  • 24,851
  • 10
  • 43
  • 78
  • Does this means I have to change the table to InnoDB? I don't see any changes in EXPLAIN at all. – Red Romanov Jun 30 '16 at 04:18
  • You need volume. Ignore explain for small tables (as it relates to the reality when it grows). Let me do an edit to this comment in 1 minute with a url. Edit: read this [Answer](http://stackoverflow.com/a/38108599) . – Drew Jun 30 '16 at 04:19
  • So from what I understand, it doesn't exploit the index because the table is very large and this query needs to process all rows? – Red Romanov Jun 30 '16 at 04:32
  • If that is your reality then correct. If the target count is sparse in quantity and the table is huge and the index is in good health (statistics / analyze) it uses it. If it has to use the lion-share of a huge table it won't use it. If the table has 20 rows it won't use it. – Drew Jun 30 '16 at 04:34
  • If it determines through low `cardinality` that it makes no sense, it may abandon the use of it. [Show Index](http://dev.mysql.com/doc/refman/5.7/en/show-index.html) shows cardinality. My example has extremely high cardinality, and sparse targets. It is more than happy to use my index. – Drew Jun 30 '16 at 04:36
  • I get it. I tried indexing TPerCode field which has higher cardinality and selected it, it uses the index. Can I mark this as an answer? – Red Romanov Jun 30 '16 at 04:42
  • Works for me. I can tweak my answer. – Drew Jun 30 '16 at 04:43
0

Neither MyISAM, nor InnoDB, is likely to use an index when "too much" of the table needs to be fetched.

IN (5,6) may mean 2/12ths of the table needs to be scanned? Or maybe the data is biased such that those two months have more than their share of rows?

The reason the optimizer might eschew the index in cases like this...

When using such an index, it needs to spend a lot of time bouncing between the index (one BTree) and the data.

When not using the index, it simply cruises through the data, ignoring 10/12ths of the rows. This may actually be faster.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • If `SELECT * FROM dt_prdtime WHERE TMonth = 5` uses the index, why `IN (5,6)` doesn't? It's practically the same, just one more number than `TMonth = 5` to find. How is it different? Why does it scan the whole table when it's a lot faster to find these numbers in the index? – Red Romanov Jul 02 '16 at 08:02
  • The cutoff is not exact. Often it is around 20%. One month is 8% (if evenly distributed); two months is 17%. Perhaps the statistics decided to cutoff below 17% in your case. Let's carry it farther, does it make sense for `IN(1,2,3,4,5,6,7,9,10,11)`? – Rick James Jul 02 '16 at 15:43