Performance of JOIN vs SELECT...WHERE with an example

Question

I found this example on the MySQL Tutorial:

SELECT article, dealer, price
FROM   shop
WHERE  price=(SELECT MAX(price) FROM shop);

My question: is the subquery (SELECT MAX(price) FROM shop) done once a time, or it is done repeatedly until the max price for the query is found?

In terms of performance is this other solution better?

SELECT s1.article, s1.dealer, s1.price
FROM shop s1
LEFT JOIN shop s2 ON s1.price < s2.price
WHERE s2.article IS NULL;

Thanks.

Martin Smith · Accepted Answer · 2010-11-20T12:03:53.657

2

The sub query is non correlated so any sensible implementation will only evaluate it once. Note that MySQL does have a problem with IN though where the semantically equivalent

SELECT article, dealer, price
FROM   shop
WHERE  price IN (SELECT MAX(price) FROM shop);

Leads to the sub query being evaluated multiple times.

As far as evaluating performance you would need to look at the explain plan for both in your particular RDBMS.

The most efficient solution might be to use SELECT TOP .. WITH TIES or equivalent if you have a covering index on the price column and your RDBMS has such a construct.

edited Nov 20 '10 at 12:03

answered Nov 20 '10 at 11:58

Martin Smith

438,706
87
741
845

Why MySQL has that problem with IN? Where I can find documentation about that? – BMario Nov 20 '10 at 13:17
@BMario - See http://stackoverflow.com/questions/3417074/why-would-an-in-condition-be-slower-than-in-sql – Martin Smith Nov 20 '10 at 13:19

score 1 · Answer 2 · answered Nov 20 '10 at 12:03

You've got tags for MySql, T-Sql and PL/Sql, I suspect the answer is different for each.

The answer could also depend on what indexes you have and how unique the values in the [price] field are.

Run the query analyser to see what the actual query-plan is.

score 1 · Answer 3 · answered Nov 20 '10 at 12:49

1

to answer your question, the scalar subquery (SELECT MAX(price) FROM shop) is run once and then passed to the main query as a value, in MySQL.

So that query is as quick as anything else you could come up with.

answered Nov 20 '10 at 12:49

mikeq

817
5
5

score 0 · Answer 4 · answered Nov 20 '10 at 12:48

I used SQL Server 2008 to these three variations of the aquery. In my testing I queries the AdventureWorks DB using ProductInventory in the Production schema. The three queries are:

declare @max int
Select @max =  MAX(Quantity) FROM [AdventureWorks].[Production].[ProductInventory]
SELECT TOP 1000 [ProductID]
      ,[LocationID]
      ,[Shelf]
      ,[Bin]
      ,[Quantity]
      ,[rowguid]
      ,[ModifiedDate]
  FROM [AdventureWorks].[Production].[ProductInventory]
  WHERE Quantity = @max

SELECT TOP 1000 [ProductID]
      ,[LocationID]
      ,[Shelf]
      ,[Bin]
      ,[Quantity]
      ,[rowguid]
      ,[ModifiedDate]
  FROM [AdventureWorks].[Production].[ProductInventory]
  WHERE Quantity = (Select MAX(Quantity) FROM [AdventureWorks].[Production].[ProductInventory])


SELECT TOP 1000 AW1.[ProductID]
      ,AW1.[LocationID]
      ,AW1.[Shelf]
      ,AW1.[Bin]
      ,AW1.[Quantity]
      ,AW1.[rowguid]
      ,AW1.[ModifiedDate]
  FROM [AdventureWorks].[Production].[ProductInventory] AW1
LEFT JOIN [AdventureWorks].[Production].[ProductInventory] AW2 ON AW1.Quantity < AW2.Quantity
WHERE AW2.ProductID IS NULL;

Using the "Show estimated query plan" icon I can compare the exection events for three cases. The results are:

Declaring a variable and filling the variable is 5% faster than a sub-select in the where clause.
The join is 98% slower than the subselect
The join is 99% slower than the variable

My suggestion is to declare a variable and fill it. Use the variable in the WHERE clause

There shouldn't be any particular benefit of using a variable. Maybe the two 0% batches are actually between 0.25% and 0.49% meaning that when both steps are added together it adds up to a display value that rounds to 1%. Indeed if you look at the plans you will see they both do exactly the same work. — Martin Smith, Nov 20 '10 at 13:01
Although having said that assigning to a variable can be useful in the following situations. (1) To use with `OPTION (RECOMPILE)` in order to get better cardinality estimates for the rest of the query. (2) In parallel execution plans it may allow better plans. (Source: "Inside Microsoft SQL Server 2005 Query Tuning and Optimization") — Martin Smith, Nov 21 '10 at 00:13

score -1 · Answer 5 · answered Nov 20 '10 at 11:59

-1

I can beat both:

SELECT article, dealer, price
FROM   shop
WHERE  price=MAX(price)

Edit: Whoops, not working on my test server :/

answered Nov 20 '10 at 11:59

J V

11,402
10
52
72

Won't work. Can't use aggregates in a where clause like that. – Martin Smith Nov 20 '10 at 12:01
1

@J V haha how would mysql know max of which values before even finding the values. Liked the confidence and the reaction on failing though – Sandeepan Nath Nov 20 '10 at 12:26
"Uhm actually this is working fine on my local system, but some problem in the test server..." is one of the most common excuses we give when our project manager asks status after deadline time. – Sandeepan Nath Nov 20 '10 at 21:16
@J V I didn't downvote you and that is only because of that :) – Sandeepan Nath Nov 20 '10 at 21:17
@Sandeepan: My test server **is** my local system :) – J V Nov 20 '10 at 21:47
I get a badge for deleting a bad answer? Uhm... Why not? :D **Edit:** Nope, only counts if I get +3 or -3 :D – J V Nov 21 '10 at 10:30

score -1 · Answer 6 · answered Nov 20 '10 at 12:03

-1

Using a join should be better than a nested sub-query.

answered Nov 20 '10 at 12:03

Pigol

1,221
2
13
18

Any evidence for this assertion? In the OP's case the sub query looks massively better. As 2 index seeks would easily beat an index scan and a triangular join. Obviously the optimiser is free to transform the queries anyway though. – Martin Smith Nov 20 '10 at 12:18
1

-1. This is a totally false statement. I don't even know where to begin. – Dave Markle Nov 20 '10 at 13:14

score -2 · Answer 7 · answered Nov 20 '10 at 12:52

-2

You should not care. Any decently modern database server will understand what you want and perform the query in the most efficient way it can. SQL is declarative, not imperative (ie. you say what results you want, you don't say how it is to be retrieved).

answered Nov 20 '10 at 12:52

erikkallen

33,800
13
85
120

In theory yes. In practice the optimiser only spends a certain amount of time applying transformation rules and the way a query is written can heavily influence the "how". You need look no further than [RC_Cleland's answer](http://stackoverflow.com/questions/4232570/performance-of-join-vs-select-where-with-an-example/4232785#4232785) for proof of this. – Martin Smith Nov 20 '10 at 13:07
3

I have to disagree here. SQL is not just a black box that magically works. It's important to understand how your queries will be evaluated before running them -- and being ignorant of that will often lead to all kinds of performance problems down the road -- some of which will not be easy to optimize away. – Dave Markle Nov 20 '10 at 13:18
If this were true then tools such as `explain plan` wouldn't exist. The truth of the matter is differently formed SQL will give different execution plans so it's important to know how to tune still. – Donnie Nov 21 '10 at 13:42
Yes, occasionally different SQL can give different execution plans. Usually, though, including simple cases like the OP asked, it should not. In the linked RC_Cleland answher, the problem is that he does a non-equijoin in the slow case, and those are bad (pretty much nested loops as only choice). – erikkallen Nov 22 '10 at 00:20
And if you want different execution plans, you should NOT do it by tweaking the SQL until it appears to work. You should do it with the tools your database gives you which are intended for use with this. In Oracle, that means hints, in SQL server it means hints and/or plan guides. – erikkallen Nov 22 '10 at 00:21

Performance of JOIN vs SELECT...WHERE with an example

7 Answers7