Which query is the fastest--the one using inner join or the one using a subquery?

Question

I have two queries giving me the same result. Which one is the most efficient?

The model:

Order {  
 orderId 
}

OrderItem {  
 orderItemId  
 orderId   
 productId 
}

Product {   
 productId 
}

The relationship between Order and OrderItem is one-to-many, and many OrderItems are related to one Product.

I'd like to retrieve the orders that are related to a specific product (in the requests the parameter :productId).

The first request with inner join:

Select distinct o
from Order o
inner join OrderItem oi
on o.id =
oi.orderId  and oi.productId = :productId

The second request with a count subquery in the where clause:

Select o 
from Order o
where 
    (Select count oi 
    from OrderItem oi 
    where oi.orderId = o.id and oi.productId = :productId
    ) > 0

I also use DB2 and Hibernate. And there is an index on each primary and foreign keys.

It’s an unanswerable question. It depends on the RDBMS you are using, the hardware you are running on, the data volume, indexing/partitioning, etc etc — NickW, Oct 16 '22 at 16:39
Please include the execution plan of each query. Also, please include the existing indexes in both tables. I would guess the first query is easier to optimize. — The Impaler, Oct 16 '22 at 17:58
Does this answer your question? [Join vs. sub-query](https://stackoverflow.com/questions/2577174/join-vs-sub-query) — philipxy, Oct 16 '22 at 23:28
[How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) [mre] [How to Optimize Queries in a Database - The Basics](https://stackoverflow.com/q/3191623/3404097) [What are your most common sql optimizations?](https://stackoverflow.com/q/1332778/3404097) [When and why are database joins expensive?](https://stackoverflow.com/q/173726/3404097) [Tips for asking a good SQL question](https://meta.stackoverflow.com/q/271055/3404097) [Asking query performance questions](https://dba.meta.stackexchange.com/q/3034/43932) — philipxy, Oct 16 '22 at 23:28
Each make and version of database server has its own ways of interpreting and satisfying queries. Please [edit] your question to add a [tag](//stackoverflow.com/help/tagging) for the database you use. [tag:postgresql]? [tag:mysql]? [tag:sql-server]? [tag:oracle]? [tag:google-bigquery]? [tag:amazon-redshift]? Another? — O. Jones, Oct 17 '22 at 13:58

score 1 · Accepted Answer · answered Nov 14 '22 at 11:50

Your queries aren't valid. You must say SELECT column rather than SELECT table. I assume you want this.

Select distinct o.id    -- COLUMN NAME here.
from Order o
inner join OrderItem oi
on o.id =
oi.orderId  and oi.productId = :productId

You want something straightforward: a set of ids from one table that meet a condition in a joined table. You could probably get this with a very simple one-table query.

SELECT DISTINCT oi.orderId AS Id
FROM OrderItem oi
WHERE oi.productId = :productId

But, unless there's an enforced foreign-key constraint between oi.orderId and o.Id this might get you some extra orderIds: ones that exist in the OrderItem table but not in the Order table.

Your first example is a good way -- an ideal way, even -- to overcome that objection.

Your second example is not so good. It has a correlated subquery. That is, the subquery refers to a column in a table alias -- o.id -- from the outer query. That means it must potentially repeat the evaluation of the subquery for each row from the outer query. That cannot be faster than your first query, even with the smartest query planner in the galaxy.

Now, if instead you want all the columns of your Order table (SELECT o.*) matching the particular query in the second table, use this.

SELECT o.*
  FROM Order o
 WHERE o.Id IN (
          SELECT oi.orderId
            FROM OrderItem oi
           WHERE oi.productId = :productId
     )

That's the fastest way to retrieve entire rows from one table that match some criterion in another table. Notice that the IN () operator automatically handles duplicate values.

Pro tip Use SELECT DISTINCT sparingly. If you find yourself using it on more than one column, you probably need to try to understand why you might have duplicates.

Excellent answer, thank you very much. The queries I described are object queries, not standard sql, sorry. The one with the 'In' seems to be the best indeed, I guess the result of the subquery is saved, and if the algorithm uses a Hashtable for the comparison that's even better. — kaizokun, Nov 15 '22 at 17:51
SQL is a *declarative language*. We tell it *what we want*, not *how to get it*. That's a tricky thing for typical procedural programmers to wrap our minds around. — O. Jones, Nov 15 '22 at 22:37
Yes and behind the declaration, there are procedures, some better then the others — kaizokun, Nov 17 '22 at 07:35

Which query is the fastest--the one using inner join or the one using a subquery?

1 Answers1