2

I have a table the_table with attributes the_table.id, the_table.firstVal and the_table.secondVal (the primary key is the_table.id, of course).

After defining an index over the first non-key attribute like this:

CREATE INDEX idx_firstval  
ON the_table (firstVal);

The EXPLAIN result for the following disjunctive (OR) query

SELECT * FROM the_table WHERE the_table.firstVal = 'A' OR the_table.secondVal = 'B';

is

| id    | select_type | table     | type    | possible_keys | key   | key_len   | ref   | rows  | Extra
| 1     | SIMPLE      | the_table | ALL     | idx_firstval  | NULL  | NULL      | NULL  | 3436  | Using where

which shows that the index idx_firstval is not used. Now, the EXPLAIN result for the following conjunctive (AND) query

SELECT * FROM the_table WHERE the_table.firstVal = 'A' AND the_table.secondVal = 'B';

is

| id    | select_type   | table     | type  | possible_keys | key           | key_len   | ref   | rows  | Extra 
| 1     | SIMPLE        | the_table | ref   | idx_firstval  | idx_firstval  | 767       | const | 124   | Using index condition; Using where

which shows the index in use, this time around.

Why is MySQL choosing not to use indexes for the disjunctive query, but it is for the conjunctive one?

I've scoured SO, and as suggested by the answer in this thread, "using OR in a query will often cause the Query Optimizer to abandon use of index seeks and revert to scans". However, this doesn't answer why it happens, just that it does.

Another thread tries to answer why a disjunctive query doesn't use indexes, but I think it fails at doing so - it is merely concluded that the OP is using a small database. I'm wanting to know the difference between the disjunctive and the conjunctive case.

Mew
  • 368
  • 1
  • 11

2 Answers2

2

Because MySQL execution plan uses only one index for a table.

If MySQL uses range scan on idx_firstval to satisfy equality predicate on firstVal column, that leaves MySQL still needing to check the condition on secondVal column.


With the AND, MySQL only needs to check the rows returned from the range scan of the index. The set of rows that need to be checked is constrained by the condition.


With the OR, MySQL needs to check the rows that were not returned by the index range scan, all the rest of the rows in the table. Without an index, that means a full scan of the table. And if we're doing a full scan of the table to check secondVal, then it will be less expensive to check both conditions on the scan (i.e. a plan that includes an index accesses as well as a full scan will be more expensive.)

(If a composite index containing both firstVal and secondVal is available, then for the OR query, it is conceivable that optimizer might think its less expensive to check all the rows in the table by doing a full index scan, and then looking up the data pages.)


When we understand what operations are available to the optimizer, that's leads us to avoiding the OR and to rewrite the query, to return an equivalent resultset, with a query pattern that more explicitly defines a combination of two sets

SELECT a.*
  FROM the_table a
 WHERE a.firstVal = 'A'

UNION ALL

SELECT b.*
  FROM the_table b
 WHERE b.secondVal = 'B'
   AND NOT ( b.firstVal <=> 'A' )

(Add an ORDER BY if we expect rows to be returned in a particular order)

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • Regarding "If a composite index containing both firstVal and secondVal is available (...)": I checked the composite index as suggested by @TimBiegeleisen, and the `OR` still does a full table scan, without using the index. That's really strange to me; I would think the optimiser desires using such indexes in both cases, and yet it doesn't for the disjunctive query. – Mew May 04 '20 at 16:35
  • For trivial sets, the cost of a full scan is low, so the optimizer may favor an access plan with a full table scan over using an index. If the plan using the index is going to have access pages from the index *and* pages from the data table... for a really small table, not at all surprising that the cost estimate is lower for the plan that ignores the index and just get the pages from the table. – spencer7593 May 04 '20 at 16:52
  • With the * in the SELECT list, MySQL will need to visit data pages in the underlying table for any columns not available in the index. If a *covering* index for the query is available, we are more likely to see an index being used. It won't be a range scan operation, since it's needs to check every single row i.e. a full scan, but it can do the scan on an index instead of the table. Try ditching the * in the SELECT list, and have the query reference only columns available in the index e.g. `SELECT t.firstVal, t.secondVal FROM the_table t WHERE ... OR ...` – spencer7593 May 04 '20 at 18:47
1

I am surprised that MySQL is using an index for either of the two queries. The correct index to use here would be a composite index which covers the two columns in the WHERE clause:

CREATE INDEX idx ON the_table (firstVal, secondVal);

As to why MySQL is using the index in the second case, one possibility might be if most of the records in the_table have firstVal values which are not A. In this case, simply knowing that the equality the_table.firstVal = 'A' is false would mean that the entire outcome of the WHERE clause would be known (as false). So, the answer as to why the index is being used could have something to do with the cardinality of your exact data. But in any case, consider using the composite index to cover all bases.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • You're not wrong that 'A' is a rare value in that table (124 entries), so I did some additional queries for context. The two most prevalent values (1747 and 1446 entries respectively, call them 'C' and 'D') both have the exact same `EXPLAIN`s. Defining the composite index as per your suggestion, the `AND` switches over to it, but the `OR` still does a full table scan. – Mew May 04 '20 at 16:29
  • @Mew I don't know your actual data or how many records there are, but sometimes a SQL database won't even use an index if there are too few records. – Tim Biegeleisen May 04 '20 at 16:38
  • 1
    @Mew: if the composite index is a covering index for the query, we might see a full index scan for the `OR` query, if all of the columns referenced in the query are included in the index. e.g. a query that ditches the `*` in the SELECT list and instead does SELECT firstVal, secondVal FROM the_table ...` (the optimizer can't make use of a range scan operation, it still needs to check every row, but if it can satisfy the query entirely from the index without needing to lookup pages in the underlying table, we call that a *covering* index for the query). – spencer7593 May 04 '20 at 16:38
  • @TimBiegeleisen: Ah, maybe I should've clarified that specifically: 3436 rows in `the_table`, as displayed by the first output in my post (in column "`rows`"). – Mew May 04 '20 at 16:39
  • That's a pretty small table. My guess is that the index won't help as much in the `OR` case, so MySQL is just falling back to a table scan. – Tim Biegeleisen May 04 '20 at 16:43
  • @spencer7593: Maybe you should clarify that in your answer, because that was actually a neat insight. I checked on the database, and it's true that if I swap out the `*` for a `the_table.firstVal, the_table.secondVal` (so, no longer asking for `the_table.id`), the `EXPLAIN` shows `type | ìndex` and the full index is indeed used. – Mew May 04 '20 at 16:46
  • The boost you get for covering the `SELECT` clause is typically small. – Tim Biegeleisen May 04 '20 at 16:50
  • @TimBiegeleisen: On a similar 26789-row table, the `OR` still doesn't use the composite index, but that might still be considered "small". – Mew May 04 '20 at 16:58