0
select
    tableA.field1,
    tableB.field2
from
    tableA
    inner join tableB on table1.someField = table2.someField
where
    tableA.field1 = 'value1'
    or tableB.field2 = 'value2'

I have good indexes on both tableA.field1 and tableB.field2. If I change the where clause from "or" to "and", then query is fast. However when using an or it seems it can only make use of one of the two indexes.

Both tables contains several 100,000 records.

Is there anyway to redesign the query to make use of both indexes when having an or in the where clause?

forpas
  • 160,666
  • 10
  • 38
  • 76
srayner
  • 1,799
  • 4
  • 23
  • 39
  • 1
    Possible duplicate of [MySQL: how to index an "OR" clause](https://stackoverflow.com/questions/2829544/mysql-how-to-index-an-or-clause) – Sirmyself Jul 29 '19 at 13:51

1 Answers1

0

I would re-write the query to combine two separate sets, avoiding the OR in the WHERE clause.

Something like this:

( SELECT a1.field1
       , b1.field2
    FROM tableA a1
    JOIN tableB b1
      ON b1.somefield = a1.somefield
   WHERE a1.field1 = 'value1'
)   
UNION ALL
( SELECT a2.field1
       , b2.field2
    FROM tableA a2
    JOIN tableB b2
      ON b2.somefield = a2.somefield
   WHERE b2.field2 = 'value2'
  HAVING NOT (a2.field1 <=> 'value1')
)

That query should be able to make effective use of these indexes, for the SELECT

 ... ON tableA (field1,somefield,...)
 ... ON tableB (somefield,field2,...)

and for the second SELECT

 ... ON tableA (somefield,field1,...)
 ... ON tableB (field2,somefield,...)

Use EXPLAIN to see the execution plan


Another way to look at the set being returned is as the combination of three sets...

  • rows that match on both a.field1 and b.field2
  • rows that match on a.field1 but not on b.field2
  • rows that match on b.field2 but not on a.field1

The first subset would the original query, replacing OR with AND.

Each of the last two subsets could be returned

SELECT ...
  FROM a
  JOIN b
    ON ...
 WHERE a.field1 = 'value1'
   AND NOT ( b.field2 <=> 'value2' )

and

SELECT ...
  FROM a
  JOIN b
    ON ...
 WHERE b.field2 = 'value2'
   AND NOT ( a.field1 <=> 'value1' )

Provide suitable indexes for each SELECT, and combine the sets with UNION ALL set operator.

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • Thanks. I came to the conclusion that I would need a union but wasn't quite sure, so good to know that you have same opinion. I didn't think I needed the HAVING clause in the second query. I also didn't know about the <=> operator, which is good to know, but do we need it here? Also couldn't we just use UNION instead of UNION ALL? – srayner Jul 29 '19 at 14:44
  • The `UNION` set operator removes duplicate values; the `UNION ALL` set operator does not. So in terms of replicating the result from the original query, the `UNION ALL` will do that; the `UNION` might also do that, but that would be a more specialized case rather than the general case. From a performance standpoint, `UNION`is going to cause a "Using filesort" operation. Whether to use `UNION ALL` or `UNION` really depends on the specification. Where either will satisfy the spec, `UNION ALL` will give better performance. – spencer7593 Jul 29 '19 at 16:18
  • The condition in the HAVING clause of the second SELECT is meant to avoid returning the same rows that were already returned by the first SELECT. we could re-write that `HAVING (a.field1 <> 'value1' OR a.field1 IS NULL)`. To my mind, `NOT (a.field1 <=> 'value1')` is just a more straightforward way to negate `a.field1 = 'value1` (in the general case, where we don't have any guarantee that `field1` is non-NULL. If we have some sort of guarantee that field1 is non-NULL (which we don't see any evidence of in the question) then we could just do `a.field1 <> 'value1'` – spencer7593 Jul 29 '19 at 16:24