sql multiple whereIn performance

Question

Here is a simple query.

select * where 'id' in [1,2,3,4,5] AND 'sub_id' in [1];

If I switch these two wherein condition to this,

select * where 'sub_id' in [1] AND 'id' in [1,2,3,4,5];

With my first intuition, those two queries seem to have the same time complexity.

However, if I suppose 'id' column has 'N' data(extremely large), then the average time for the first query would take 5n, assuming 'sub_id' in [1] part would be O(1), since there are only 5 records to traverse, negligibly small. On the other hand, second query would take only 1n, since I am only searching for 1 record among N data(assuming 'id' in [1,2,3,4,5] part would be negligibly small too).

Is my assumption hypothetically correct? Assuming 'id' is not pk, thus no index is set.

So, my real question is, when I put wherein (...) condition in SQL, should I put smaller range first then only larger range in order to make the performance better? Please fix me if I'm wrong.

*select * where* is not valid SQL; assuming both columns (which should not be 'quoted' btw) are in the same table it won't make any difference as both have to be evaluated. Of course, you can easily test this yourself in less time than it takes to write a question on Stack Overflow. — Stu, Sep 27 '21 at 15:47
The optimizer will most likely make use of available statistics to figure out which condition to evaluate first regardless of the order you specify. — Radagast, Sep 27 '21 at 15:52
please ignore the syntax. I was just trying to give some example. I wonder if the order of multiple wherein condition affects the performance. — Danniel Lee, Sep 27 '21 at 15:54
It does not and should not matter how you arrange conditions in an AND operator — Salman A, Sep 27 '21 at 15:55
@PhilCoulson that make sense. May I ask if there is any reference or documentation I can lookup? — Danniel Lee, Sep 27 '21 at 15:56
SQL is a *declarative* language, in (almost) all cases you describe what you want and it's the job of the query engine to determine the best method; all modern query optimizers will build and evaluate multiple possible execution plans which may not and likely will not be as written. — Stu, Sep 27 '21 at 15:59
@DannielLee There are a lot of blogs and other answers on SO documenting and validating optimizer's "best-judgment" behavior. I couldn't find an official documentation specifically for the `where` clause, but for what it's worth, there is one for `joins` https://learn.microsoft.com/en-us/sql/relational-databases/performance/joins?view=sql-server-ver15#:~:text=order%20in%20which%20the%20tables%20will%20be%20joined The discussion on this question is helpful too https://stackoverflow.com/questions/11436469/does-the-order-of-where-clauses-matter-in-sql — Radagast, Sep 27 '21 at 16:32

score 1 · Answer 1 · answered Sep 27 '21 at 15:52

In the case of your current query, the order of which of the 2 ANDed terms in the WHERE clause gets evaluated first is moot, because by definition the SQL engine must check all parts of the AND expression to make a decision. If the query had ORed terms, then your ultimate question would make more sense. In that case, if you were to examine the execution plan, your query could evaluate in either of the 2 following orders:

SELECT * FROM yourTable WHERE id IN (1, 2, 3, 4, 5) OR sub_id IN (1);
SELECT * FROM yourTable WHERE sub_id IN (1) OR id IN (1, 2, 3, 4, 5);

In this case, how you write the SQL is also a moot point, because your SQL optimizer most likely would be smart enough to figure out the fastest way to evaluate the SQL.

Your responsibility as a data engineer would be to do something like put an index on (sub_id, id), or maybe (id, sub_id), which if used would speed up the execution of the WHERE clause.

Thank you for your explanation. SQL optimizer is always a black-box to me. — Danniel Lee, Sep 27 '21 at 16:02

sql multiple whereIn performance

1 Answers1