3

There is a table with these columns:

Name Type
Id int
DateTime datetime(6)
CompanyId int (FK)
IsExcluded tinyint(1)

There are 2 BTREE indexes:

  1. CompanyId
  2. ComapnyId,DateTime,IsExcluded

Following select is using the first index with only one column and this select takes 2.3sec. When I force the second index select takes 0.015sec. Also when I decrease DateTime range by one day MySQL is using the second index without forcing.

select IsExcluded,DateTime,CompanyId FROM table where 
IsExcluded = 0 and 
DateTime >= '2022-06-02' and 
DateTime < '2022-09-22' and 
CompanyId = 1;

I understand that if more than ~20%-30% of rows are selected, MySQL could decide to ignore index, but I don't understand why MySQL is selecting different obviously not the most suitable index.

Is there any way how to setup or "learn" MySql that second index for this query is the most suitable (without inserting anything else to query) ?

Davidm176
  • 163
  • 1
  • 12
  • Have you asked MySQL to [EXPLAIN](https://dev.mysql.com/doc/refman/8.0/en/using-explain.html) the select? – Andy Jones Sep 28 '22 at 13:57
  • yes, I check it and query cost is lower with index with one column than index with 3 columns (even it is much more slower) - probably DateTime in index makes index huge and makes it more expensive – Davidm176 Sep 28 '22 at 15:04

3 Answers3

1

Is there any way how to setup or "learn" MySql that second index for this query is the most suitable (without inserting anything else to query) ?

You can in theory tweak the index statistics manually to influence the optimizer's choice. See https://dev.mysql.com/blog-archive/histogram-statistics-in-mysql/

But frankly, I know of no developers who use this feature. It's just too difficult to figure out how to use it, and any custom index statistics you choose would be likely to get out of date very quickly.

What is more common is to use index hints to tell the optimizer to consider only a specific index, or to ignore other indexes. See https://dev.mysql.com/doc/refman/8.0/en/index-hints.html

Example:

SELECT IsExcluded, DateTime, CompanyId 
FROM table USE INDEX (myindex_with_three_columns)
WHERE IsExcluded = 0 
 AND DateTime >= '2022-06-02' 
 AND DateTime < '2022-09-22' 
 AND CompanyId = 1;

(I'm guessing at the name of your 3-column index.)

I would guess the reason your index is not used is that the DateTime portion is not helping enough to justify the wider index (i.e. your date range covers too many rows), and the IsExcluded portion is not used at all, because it follows a column that is used in a range condition. So the optimizer chooses an index that's more compact, so it can load the index into RAM in fewer I/O reads.

What's important to understand about column order in the index is that the columns involved in equality conditions should be to the left. Then you can have one column that is used in an inequality or range condition, and any subsequent columns in the index are not used for searching or sorting.

In your case you have an index on (CompanyId,DateTime,IsExcluded), but the second column is used in a range condition, so the third column isn't used for the search. At best it could try to use index condition pushdown, but that's not as good as really narrowing down the search.

The better index would be to order the columns so the DateTime column is last. Either of the columns used for equality may be first, but they both have to be before the DateTime column. Either of these orders would be better in this case:

(CompanyId,IsExcluded,DateTime)
(IsExcluded,CompanyId,DateTime)
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
0

Updated

As per my understanding the order of keys does matter. Check also related article

MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.

More info at MySQL doc

Isidoros Moulas
  • 520
  • 3
  • 10
  • 1
    order of columns in select and where condition doesnt matter – Davidm176 Sep 28 '22 at 13:24
  • @Davidm176 check the updated answer. MySQL doc notes that the order does matter. This was a very nice question indeed. – Isidoros Moulas Sep 28 '22 at 13:28
  • 1
    order of index MATTER but orded in where conditions and selected columns DOESNT MATTER – Davidm176 Sep 28 '22 at 13:32
  • @Davidm176 please check the documentation. There is a nice example at the end with columns lastname and firstname that explains the issue in detail. https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html – Isidoros Moulas Sep 28 '22 at 13:34
  • 1
    @IsidorosMoulas No, the placement of conditions in the where (or select) does not matter. You may be misunderstand the documentation. The 2 situations in the documentation with lastname/firstname where the index cannot be used is because it includes `[or] first_name = ...`. For that, you would need an index that has first_name as the first column. It has nothing to do with where in the query the condition is placed. So yes: the column order in the index matters. But it's not related to the position of conditions, but to the existence of conditions (and their type). – Solarflare Sep 28 '22 at 15:52
  • @Solarflare, well i agree with you but definitely i want to test this with a large dataset to see the results. I pretty sure that a lot of people tried this too. – Isidoros Moulas Sep 28 '22 at 16:44
0

Given a 'smaller' INDEX(a) and a 'larger' INDEX(a,b), the Optimizer will often use the smaller index even though the larger one would do more filtering.

This is a recurring problem, I have seen it in many situations; there needs to be a bug report encouraging them to fix the Optimizer.

The workaround is to DROP the smaller index.

Furthermore, as Bill points out, your larger index is not optimal. In general, move any range column to the end of the index.

There's another point here: For that SELECT your 3-column index should have been picked because it is "covering". That means that all of the columns anywhere in the SELECT are in the single INDEX (in any order). (Bill's recommendations are both covering and faster.)

Rick James
  • 135,179
  • 13
  • 127
  • 222