0

I have the following query which is taking about a minute to run:

SELECT * FROM main_i i JOIN main_p p ON p.item_id=i.id
WHERE COALESCE(p.provider_title_id, i.provider_title_id) = "X"

While this looks like a very easy query, it takes forever to run, and the only way I've been able to solve it is using the following approach:

SELECT * FROM main_i i JOIN main_p p ON p.item_id=i.id
WHERE p.provider_title_id = "X"
UNION
SELECT * FROM main_i i JOIN main_p p ON p.item_id=i.id
WHERE i.provider_title_id = "X" AND p.provider_title_id IS NULL

This finishes instantaneously, however the query seems so stupid that there must be a better way to do this.

What should I be using here?

David542
  • 104,438
  • 178
  • 489
  • 842
  • What about `WHERE p.provider_title_id = "X" OR (i.provider_title_id = "X" AND p.provider_title_id IS NULL)`? Am I missing something that's leading you to union these instead? – Elizabeth Mar 26 '19 at 21:03
  • 1
    the use of function as coalesce don't allow the use of index .. – ScaisEdge Mar 26 '19 at 21:04
  • 1
    @EthanJ. just as slow. -- took 31.4s to run the query vs 10.4ms for the union. – David542 Mar 26 '19 at 21:04
  • Can you provide the output of EXPLAIN? – Mike Doe Mar 26 '19 at 21:06
  • 1
    MySQL's optimizer is just not that good. Wait some years and it may be able some day to see that an `OR` condition can be rewritten to `UNION ALL` and vice versa and come up with the same execution plan. – Thorsten Kettner Mar 26 '19 at 21:06

3 Answers3

2

In my experience, use of functions (such as COALESCE) stop db engines to optimize row selection upfront. DB engines aren’t sure of final outcome until they run every record resulting from joining tables through such functions. On the contrary, with your second query db engines exactly know what to filter just when creating initial set (before where is applied). You can perhaps pick this in the query explain plan.

As you would know, another way to achieve the same outcome would be following. This should run quicker, in a way proving the fact..

SELECT *
FROM main_i i JOIN
     main_p p
     ON p.item_id = i.id
WHERE p.provider_title_id = 'X'
OR
(p.provider_title_id IS NULL 
AND i.provider_title_id = 'X')
Gro
  • 1,613
  • 1
  • 13
  • 19
  • no this turns out to be just as slow as using a coalesce. Please see my comment above where it takes 31.4s to run this query vs 10ms to run it with a union. – David542 Mar 27 '19 at 00:09
1

In your first query the predicate COALESCE(p.provider_title_id, i.provider_title_id) = "X" is based on columns from two tables. This forces MySQL to use this predicate as a "filtering predicate" rather than as an "access predicate".

What does this mean in English? It means MySQL performs the cross product resulting from the JOIN using [hopefully fast] access predicates, but then it's forced to filter the whole result set using the condition above. The cross product probably results in a huge number of rows that are going to be discarded by the condition. Lots of work for a small result.

The second query uses simple predicates that MySQL is able to use for accessing rows. This time only few rows are accessed. The predicate filters some on them, but the total amount of work is very limited.

You would see this much clearer if you produced and compared the execution plans of each query.

The Impaler
  • 45,731
  • 9
  • 39
  • 76
0

This is fine, but you should use UNION ALL:

SELECT *
FROM main_i i JOIN
     main_p p
     ON p.item_id = i.id
WHERE p.provider_title_id = 'X'
UNION ALL
SELECT *
FROM main_i i JOIN
     main_p p 
     ON p.item_id = i.id
WHERE i.provider_title_id = 'X' AND p.provider_title_id IS NULL;

This version allows the optimizer to see two simpler subqueries. Each subquery can take advantage of a different set of indexes. In general, SQL does a poor job with inequalities and OR in JOIN conditions.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786