1

Assume I have a database set up and a table named MyTable which contains a large number of records (tens of thousands). Assume a query as follows...

select * from MyTable where ColumnX = 'X'

... returns just a handful of records (< 10). Let's say I wanted to filter this result set further to only those records where ColumnY matches 'Y1' or 'Y2'. Is it better from a speed and memory perspective to simply modify to the above query as follows...

select * from MyTable where ColumnX = 'X' and (ColumnY = 'Y1' or ColumnY = 'Y2')

... Or is it better to iterate over the (small) result set in code and filter out only those records where ColumnY matches 'Y1' or 'Y2'? The reason I ask is because I have been told that OR clauses are bad in database queries from a performance perspective (when dealing with large tables) and better avoided where possible.

Note: The scenario in which this applies for me is an Android application with a local SQLite database but I guess the question is a bit more generic than that.

Adil Hussain
  • 30,049
  • 21
  • 112
  • 147
  • given the complexity of your query, there is a good chance it is faster to filter in the query rather than after having loaded the results. (plus you can create an index on `(ColumnX, ColumnY)`, and get very fast results.) – njzk2 Sep 23 '14 at 14:42
  • @njzk2 your soultion is not upto the marked according to the question. Indexing its another part and does not related to this question. He wants to know the use of IN / OR clause and how it will impact on the sql query performance. –  Sep 23 '14 at 14:52
  • @ajay: the question is `Is it better to avoid OR clauses`. My comment suggests that the answer depends on the database structure, in particular the notion that having a properly indexed table can get you much faster results than in-memory /a posteriori/ filtering. And also that even without index, there is a good chance that an in-query filtering will be faster. – njzk2 Sep 23 '14 at 14:59
  • @ajay: to be more concise, `how it will impact (...) performance` depends a lot on whether there is an index or not. – njzk2 Sep 23 '14 at 15:00

3 Answers3

1

SQLite's documentation describes multiple optimizations that can be done on queries with OR, and says:

For any given query, the fact that the OR-clause optimization described here can be used does not guarantee that it will be used. SQLite uses a cost-based query planner that estimates the CPU and disk I/O costs of various competing query plans and chooses the plan that it thinks will be the fastest. If there are many OR terms in the WHERE clause or if some of the indices on individual OR-clause subterms are not very selective, then SQLite might decide that it is faster to use a different query algorithm, or even a full-table scan. Application developers can use the EXPLAIN QUERY PLAN prefix on a statement to get a high-level overview of the chosen query strategy.

In any case, implementing the OR by hand in your code is very likely to be slower than letting the database do it, because the database has to read and return all rows that match on ColumnX, even those that will not match on ColumnY.

Furthermore, the database already has code to do this filtering; implementing it again just increases the complexity of your code and the chances of errors.

The statement that "OR clauses are bad in database queries from a performance perspective (when dealing with large tables) and better avoided where possible" is not quite true; if you need the OR, all alternatives are worse.

CL.
  • 173,858
  • 17
  • 217
  • 259
0

You can try with IN clause :

select * from MyTable where ColumnX = 'X' and ColumnY in ('Y1','Y2')
user3851404
  • 190
  • 1
  • 2
  • That doesn't answer my question. The principle is the same: `IN` works the same way as `OR`. My question is about performance. Whether it's better to do the additional filter in the database query or in code on the initial query's result set. – Adil Hussain Sep 23 '14 at 14:41
  • 1
    Before posting answer you should know or understand whether it is right or not. You solution is absolutely wrong. IN clause result in poor performance as compared to OR clause in SQL query –  Sep 23 '14 at 14:42
  • 2
    @ajay: this is an interesting affirmation. Do you have any source to back that? (I would be interested in understand `why` and `how much`) – njzk2 Sep 23 '14 at 14:43
  • Please check above my answer –  Sep 23 '14 at 14:47
  • @ajay You are wrong; your answer says "the execution plan for both queries will likely be equivalent". – CL. Sep 23 '14 at 17:33
-2

Yes, Ajay you are right, thank you . Another way to solve this will be to either use temp table, or With clause.

-- Following solution for Oracle , need little change for each db product to replace DUAL

with YT (columnY)( select 'Y1' as columnY from DUAL
         union 
         select 'Y2' from DUAL)
select MT.* 
from MyTable as MT
   , YT
where  MT.ColumnX = 'X' 
and MT.ColumnY = YT.columnY
user3851404
  • 190
  • 1
  • 2