85

I just read part of an optimization article and segfaulted on the following statement:

When using SQL replace statements using OR with a UNION:

select username from users where company = ‘bbc’ or company = ‘itv’;

to:

select username from users where company = ‘bbc’ union
select username from users where company = ‘itv’;

From a quick EXPLAIN:

Using OR:

enter image description here

Using UNION:

enter image description here

Doesn't this mean UNION does in double the work?

While I appreciate UNION may be more performant for certain RDBMSes and certain table schemas, this is not categorically true as the author suggestions.

Question

Am I wrong?

hjpotter92
  • 78,589
  • 36
  • 144
  • 183
Jason McCreary
  • 71,546
  • 23
  • 135
  • 174
  • 1
    I would think that `UNION` is doing more work as it needs to remove the duplicates, where the filter would fetch the specified criteria. I'm surprised that it doesn't recommend using `IN`. – Kermit Dec 06 '12 at 18:58
  • 3
    It can be true, if mysql cant use indexes when there is `OR` in `WHERE` clause. – Ihor Romanchenko Dec 06 '12 at 18:58
  • @Igor Romanchenko, support this with an answer. – Jason McCreary Dec 06 '12 at 19:05
  • I would guess that the query analyzer would re-write the or statements into an IN statement, since they are equivalent – Darren Kopp Dec 06 '12 at 19:08
  • @Darren Kopp, I'm pretty sure it's the other way around. At least in MySQL anyway. – Jason McCreary Dec 06 '12 at 19:09
  • We use DB2 and interestingly enough, our DBAs had us rewrite OR statements with UNION statements because the explain came out better. I'm uncertain if this is always the case though as we have OR statements in other places that explain out fine. – tjg184 Dec 06 '12 at 19:40

6 Answers6

155

Either the article you read used a bad example, or you misinterpreted their point.

select username from users where company = 'bbc' or company = 'itv';

This is equivalent to:

select username from users where company IN ('bbc', 'itv');

MySQL can use an index on company for this query just fine. There's no need to do any UNION.

The more tricky case is where you have an OR condition that involves two different columns.

select username from users where company = 'bbc' or city = 'London';

Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.

The UNION solution is for this type of case.

select username from users where company = 'bbc' 
union
select username from users where city = 'London';

Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.


An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.

My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.

The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    The quote I provided was the exact example in the article. So there was nothing to misinterpretation. I knew using `UNION` vs. `OR` was **not categorically true**. But I am marking this as correct since it addressed the original example as *incorrect*, while providing a use case of what the author likely *meant*. – Jason McCreary Dec 13 '12 at 19:16
  • Alas, the author may have been writing about a solution without understanding the cases where the solution is useful versus not needed. Or he may have been basing his knowledge on an ancient version of MySQL that didn't optimize `IN()` predicates as well. – Bill Karwin Dec 13 '12 at 19:19
  • @BillKarwin if the two different columns are indexed then won't MySQL perform "Index Merge Optimization" to get merged result of individual scans based on both the two indexes? – sactiw Nov 16 '15 at 13:51
  • @sactiw, Sometimes. In practice, I have found that the optimizer doesn't use the index merge when one would expect it to, so I don't rely on this. – Bill Karwin Nov 16 '15 at 14:10
  • 1
    I finally understand the need for UNION. Thanks! I am ordering your book from Amazon. – isapir Feb 03 '16 at 03:56
  • If the two columns in the WHERE clause came from different tables, would that negate the need to rewrite the query using UNION? If it's `tbl1.company = 'bbc' OR tbl2.city = 'london'` can MySQL make full use of indexes, since it can have one index per table? – Codemonkey Sep 14 '18 at 11:30
  • Try it out with EXPLAIN and see! – Bill Karwin Sep 14 '18 at 14:45
  • Just curiosity, the article is now old, but was it from w3schools by any chance... – Déjà vu Feb 19 '19 at 05:59
  • I have no idea, the OP did not post a link. I google for the phrase the OP is quoting, but I don't find it. – Bill Karwin Feb 19 '19 at 07:14
  • 1
    This explains alot, specially query times when in the Where clause it is validating columns from one Join OR another column from another Join. Thank you! – Gotham Llianen Mar 17 '21 at 16:31
5

Those are not the same query.

I don't have much experience with MySQL, so I am not sure what the query optimizer does or does not do, but here are my thoughts from my general background (primarily ms sql server).

Typically, the query analyzer can take the above two queries and make the exact same plan out of them (if they were the same), so it wouldn't matter. I would suspect that there is no performance difference between these queries (which are equivalent)

select distinct username from users where company = ‘bbc’ or company = ‘itv’;

and

select username from users where company = ‘bbc’ 
union
select username from users where company = ‘itv’;

Now, the question is, would there be a difference between the following queries, of which I actually don't know, but I would suspect that the optimizer would make it more like the first query

select username from users where company = ‘bbc’ or company = ‘itv’;

and

select username from users where company = ‘bbc’ 
union all
select username from users where company = ‘itv’;
Darren Kopp
  • 76,581
  • 9
  • 79
  • 93
  • 1
    + about the queries not being the same. Nonetheless, `UNION ALL` still yields the same `EXPLAIN` as `UNION`. – Jason McCreary Dec 06 '12 at 19:13
  • `UNION ALL` is usually faster than `UNION`. The latter implies `UNION DISTINCT`, thereby requiring a de-dup pass over a temp table. Newer versions avoid the temp table in certain situations, thereby helping more. The `or` example you have is always faster, because it can use `INDEX(company)` – Rick James Jun 21 '20 at 04:14
2

It depends on what the optimizer ends up doing based on the size of the data, indexes, software version, etc.

I would guess that using OR would give the optimizer a better chance at finding some efficiencies, since everything is in a single logical statement.

Also, UNION has some overhead, since it creates a reset set (no duplicates). Each statement in the UNION should execute pretty quickly if company is indexed... not sure it'd really be doing double the work.

Bottom line

Unless you really have a burning need to squeeze every bit of speed out of your query, it's probably better to just go with the form that best communicates your intention... the OR

Update

I also meant to mention IN. I believe the following query will give better performance than the OR (it's also the form I prefer):

select username from users where company in ('bbc', 'itv');

Community
  • 1
  • 1
David J
  • 659
  • 1
  • 9
  • 25
0

This my benchmark result


When use UNION - Query took 13.8699 seconds
row examined primary select type - 247685

when use OR - Query took 0.0126 seconds and row examined primary select type - 495371

MySQL uses one index for a query, so when we are using or then mysql use one column index and scan full table for another column

another part union same work can 2 times

that's why or is faster then union

-1

In almost all cases, the union or union all version is going to do two full table scans of the users table.

The or version is much better in practice, since it will only scan the table once. It will also use an index only once, if available.

The original statement just seems wrong, for just about any database and any situation.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Just to be clear, `UNION` will also use an index if available. But it will scan both *tables*. Just a smaller dataset then merge them back together. – Jason McCreary Dec 06 '12 at 21:52
  • no-no-no, if you use `or`/`in` you are going to use "index range scan", and in the case of `union`/`union all` you are going to use `non-unique` or even `primary key lookup` plus `index merge` – Yevgeniy Afanasyev Apr 10 '19 at 07:05
  • @YevgeniyAfanasyev . . . The explain results are quite clear that there are no indexes on the table. – Gordon Linoff Apr 10 '19 at 10:25
  • Thank you for your comment. Question does not state a condition of "no indexes on the table" as well as you answer. If you would put this at the start of your answer it would help people like me to look for their case scenarios. – Yevgeniy Afanasyev Apr 11 '19 at 02:02
  • 1
    `IN` and `OR` are the same. You can see this by getting the `EXPLAIN` to see that the Optimizer turns one into the other. – Rick James Jun 21 '20 at 04:15
-1

Bill Karwin's answer is pretty right. When the both part of the OR statement has its own index, it's better doing union because once you have a small subset of results, it's easier to sort them and eliminate duplicates. Total cost is almost less than using only one index (for one of the column) and table scan for the other column (because mysql only uses one index for one column).

It depends of the table's structure and needs generally but in large tables union gave to me better results.

Cagatay Gurturk
  • 7,186
  • 3
  • 34
  • 44