1

I found this example on the MySQL Tutorial:

SELECT article, dealer, price
FROM   shop
WHERE  price=(SELECT MAX(price) FROM shop);

My question: is the subquery (SELECT MAX(price) FROM shop) done once a time, or it is done repeatedly until the max price for the query is found?

In terms of performance is this other solution better?

SELECT s1.article, s1.dealer, s1.price
FROM shop s1
LEFT JOIN shop s2 ON s1.price < s2.price
WHERE s2.article IS NULL;

Thanks.

fredt
  • 24,044
  • 3
  • 40
  • 61
BMario
  • 13
  • 3

7 Answers7

2

The sub query is non correlated so any sensible implementation will only evaluate it once. Note that MySQL does have a problem with IN though where the semantically equivalent

SELECT article, dealer, price
FROM   shop
WHERE  price IN (SELECT MAX(price) FROM shop);

Leads to the sub query being evaluated multiple times.

As far as evaluating performance you would need to look at the explain plan for both in your particular RDBMS.

The most efficient solution might be to use SELECT TOP .. WITH TIES or equivalent if you have a covering index on the price column and your RDBMS has such a construct.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
1

You've got tags for MySql, T-Sql and PL/Sql, I suspect the answer is different for each.

The answer could also depend on what indexes you have and how unique the values in the [price] field are.

Run the query analyser to see what the actual query-plan is.

Keith
  • 150,284
  • 78
  • 298
  • 434
1

to answer your question, the scalar subquery (SELECT MAX(price) FROM shop) is run once and then passed to the main query as a value, in MySQL.

So that query is as quick as anything else you could come up with.

mikeq
  • 817
  • 5
  • 5
0

I used SQL Server 2008 to these three variations of the aquery. In my testing I queries the AdventureWorks DB using ProductInventory in the Production schema. The three queries are:

declare @max int
Select @max =  MAX(Quantity) FROM [AdventureWorks].[Production].[ProductInventory]
SELECT TOP 1000 [ProductID]
      ,[LocationID]
      ,[Shelf]
      ,[Bin]
      ,[Quantity]
      ,[rowguid]
      ,[ModifiedDate]
  FROM [AdventureWorks].[Production].[ProductInventory]
  WHERE Quantity = @max

SELECT TOP 1000 [ProductID]
      ,[LocationID]
      ,[Shelf]
      ,[Bin]
      ,[Quantity]
      ,[rowguid]
      ,[ModifiedDate]
  FROM [AdventureWorks].[Production].[ProductInventory]
  WHERE Quantity = (Select MAX(Quantity) FROM [AdventureWorks].[Production].[ProductInventory])


SELECT TOP 1000 AW1.[ProductID]
      ,AW1.[LocationID]
      ,AW1.[Shelf]
      ,AW1.[Bin]
      ,AW1.[Quantity]
      ,AW1.[rowguid]
      ,AW1.[ModifiedDate]
  FROM [AdventureWorks].[Production].[ProductInventory] AW1
LEFT JOIN [AdventureWorks].[Production].[ProductInventory] AW2 ON AW1.Quantity < AW2.Quantity
WHERE AW2.ProductID IS NULL; 

Using the "Show estimated query plan" icon I can compare the exection events for three cases. The results are:

  1. Declaring a variable and filling the variable is 5% faster than a sub-select in the where clause.
  2. The join is 98% slower than the subselect
  3. The join is 99% slower than the variable

My suggestion is to declare a variable and fill it. Use the variable in the WHERE clause

RC_Cleland
  • 2,274
  • 14
  • 16
  • There shouldn't be any particular benefit of using a variable. Maybe the two 0% batches are actually between 0.25% and 0.49% meaning that when both steps are added together it adds up to a display value that rounds to 1%. Indeed if you look at the plans you will see they both do exactly the same work. – Martin Smith Nov 20 '10 at 13:01
  • Although having said that assigning to a variable can be useful in the following situations. (1) To use with `OPTION (RECOMPILE)` in order to get better cardinality estimates for the rest of the query. (2) In parallel execution plans it may allow better plans. (Source: "Inside Microsoft SQL Server 2005 Query Tuning and Optimization") – Martin Smith Nov 21 '10 at 00:13
-1

I can beat both:

SELECT article, dealer, price
FROM   shop
WHERE  price=MAX(price)

Edit: Whoops, not working on my test server :/

J V
  • 11,402
  • 10
  • 52
  • 72
-1

Using a join should be better than a nested sub-query.

Pigol
  • 1,221
  • 2
  • 13
  • 18
  • Any evidence for this assertion? In the OP's case the sub query looks massively better. As 2 index seeks would easily beat an index scan and a triangular join. Obviously the optimiser is free to transform the queries anyway though. – Martin Smith Nov 20 '10 at 12:18
  • 1
    -1. This is a totally false statement. I don't even know where to begin. – Dave Markle Nov 20 '10 at 13:14
-2

You should not care. Any decently modern database server will understand what you want and perform the query in the most efficient way it can. SQL is declarative, not imperative (ie. you say what results you want, you don't say how it is to be retrieved).

erikkallen
  • 33,800
  • 13
  • 85
  • 120
  • In theory yes. In practice the optimiser only spends a certain amount of time applying transformation rules and the way a query is written can heavily influence the "how". You need look no further than [RC_Cleland's answer](http://stackoverflow.com/questions/4232570/performance-of-join-vs-select-where-with-an-example/4232785#4232785) for proof of this. – Martin Smith Nov 20 '10 at 13:07
  • 3
    I have to disagree here. SQL is not just a black box that magically works. It's important to understand how your queries will be evaluated before running them -- and being ignorant of that will often lead to all kinds of performance problems down the road -- some of which will not be easy to optimize away. – Dave Markle Nov 20 '10 at 13:18
  • If this were true then tools such as `explain plan` wouldn't exist. The truth of the matter is differently formed SQL will give different execution plans so it's important to know how to tune still. – Donnie Nov 21 '10 at 13:42
  • Yes, occasionally different SQL can give different execution plans. Usually, though, including simple cases like the OP asked, it should not. In the linked RC_Cleland answher, the problem is that he does a non-equijoin in the slow case, and those are bad (pretty much nested loops as only choice). – erikkallen Nov 22 '10 at 00:20
  • And if you want different execution plans, you should NOT do it by tweaking the SQL until it appears to work. You should do it with the tools your database gives you which are intended for use with this. In Oracle, that means hints, in SQL server it means hints and/or plan guides. – erikkallen Nov 22 '10 at 00:21