25

Is there any difference in the performance of the following three SQL statements?

SELECT * FROM tableA WHERE EXISTS (SELECT * FROM tableB WHERE tableA.x = tableB.y)

SELECT * FROM tableA WHERE EXISTS (SELECT y FROM tableB WHERE tableA.x = tableB.y)

SELECT * FROM tableA WHERE EXISTS (SELECT 1 FROM tableB WHERE tableA.x = tableB.y)

They all should work and return the same result set. But does it matter if the inner SELECT selects all fields of tableB, one field, or just a constant?

Is there any best practice when all statements behave equal?

Dustin Laine
  • 37,935
  • 10
  • 86
  • 125
Sebastian Dietz
  • 5,587
  • 1
  • 31
  • 39

9 Answers9

34

The truth about the EXISTS clause is that the SELECT clause is not evaluated in an EXISTS clause - you could try:

SELECT * 
  FROM tableA 
 WHERE EXISTS (SELECT 1/0 
                 FROM tableB 
                WHERE tableA.x = tableB.y)

...and should expect a divide by zero error, but you won't because it's not evaluated. This is why my habit is to specify NULL in an EXISTS to demonstrate that the SELECT can be ignored:

SELECT * 
  FROM tableA 
 WHERE EXISTS (SELECT NULL
                 FROM tableB 
                WHERE tableA.x = tableB.y)

All that matters in an EXISTS clause is the FROM and beyond clauses - WHERE, GROUP BY, HAVING, etc.

This question wasn't marked with a database in mind, and it should be because vendors handle things differently -- so test, and check the explain/execution plans to confirm. It is possible that behavior changes between versions...

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
15

Definitely #1. It "looks" scary, but realize the optimizer will do the right thing and is expressive of intent. Also ther is a slight typo bonus should one accidently think EXISTS but type IN. #2 is acceptable but not expressive. The third option stinks in my not so humble opinion. It's too close to saying "if 'no value' exists" for comfort.

In general it's important to not be scared to write code that mearly looks inefficient if it provides other benefits and does not actually affect performance.

That is, the optimizer will almost always execute your complicated join/select/grouping wizardry to save a simple EXISTS/subquery the same way.

After having given yourself kudos for cleverly rewriting that nasty OR out of a join you will eventually realize the optimizer still used the same crappy execution plan to resolve the much easier to understand query with embedded OR anyway.

The moral of the story is know your platforms optimizer. Try different things and see what is actually being done because the rampant knee jerks assumptions regarding 'decorative' query optimization are almost always incorrect and irrelevant from my experience.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Einstein
  • 4,450
  • 1
  • 23
  • 20
  • +1 because this is the only answer that gave a reason for preferring one option over the other instead of just saying "X is best practice". – Dave Costa Jan 08 '09 at 15:24
  • The poster didn't ask for reasons, just best practice! – Mitch Wheat Jan 08 '09 at 23:46
  • 8
    That's like saying "the question didn't ask me to justify my answer" in a philosphy exam... You give an answer and you want people to take note, you justify your answer. – MatBailie Aug 23 '09 at 17:15
  • The SELECT clause isn't evaluated in an EXISTS - see my answer for details. – OMG Ponies Nov 06 '10 at 21:53
  • @Dems : that analogy makes no sense. – Mitch Wheat May 27 '11 at 04:27
  • 1
    @Mitch - A response to a comment made over 20 months ago? And it made enough sense for 3 up votes? – MatBailie May 27 '11 at 08:16
  • 2
    @Mitch - True, but considering the low propensity of comments to attract votes, and combined with the date from which this was possible; 3 votes is a statistically significant indication of external agreement. Or something. – MatBailie May 27 '11 at 09:41
  • @Dems: anyway. I stll say the analogy made no sense; in view of the fact my original response was slightly tongue in cheek. But humour doesn't come over well in comments and devs aren't the most likely group of people to pick up on it! – Mitch Wheat May 27 '11 at 09:50
  • @DaveCosta - Is "it looks scary" really a valid reason for a style preference? – Thomas Jun 15 '12 at 23:31
7

I realize this is an old post, but I thought it important to add clarity about why one might choose one format over another.

First, as others have pointed out, the database engine is supposed to ignore the Select clause. Every version of SQL Server has/does, Oracle does, MySQL does and so on. In many, many moons of database development, I have only ever encountered one DBMS that did not properly ignore the Select clause: Microsoft Access. Specifically, older versions of MS Access (I can't speak to current versions).

Prior to my discovery of this "feature", I used to use Exists( Select *.... However, i discovered that MS Access would stream across every column in the subquery and then discard them (Select 1/0 also would not work). That convinced me switch to Select 1. If even one DBMS was stupid, another could exist.

Writing Exists( Select 1... is as abundantly clear in conveying intent (It is frankly silly to claim "It's too close to saying "if 'no value' exists" for comfort.") and makes the odds of a DBMS doing something stupid with the Select statement nearly impossible. Select Null would serve the same purpose but is simply more characters to write.

I switched to Exists( Select 1 to make absolutely sure the DBMS couldn't be stupid. However, that was many moons ago, and today I would expect that most developers would expect seeing Exists( Select * which will work exactly the same.

That said, I can provide one good reason for avoiding Exists(Select * even if your DBMS evaluates it properly. It is much easier to find and trounce all uses of Select * if you don't have to skip every instance of its use in an Exists clause.

Thomas
  • 63,911
  • 12
  • 95
  • 141
3

In SQL Server at least,

The smallest amount of data that can be read from disk is a single "page" of disk space. As soon as the processor reads one record that satisfies the subquery predicates it can stop. The subquery is not executed as though it was standing on it's own, and then included in the outer query, it is executed as part of the complete query plan for the whole thing. So when used as a subquery, it really doesn't matter what is in the Select clause, nothing is returned" to the outer query anyway, except a boolean to indicate whether a single record was found or not...

All three use the exact same execution plan

I always use [Select * From ... ] as I think it reads better, by not implying that I want something in particular returned from the subquery.

EDIT: From dave costa comment... Oracle also uses the same execution plan for all three options

Charles Bretana
  • 143,358
  • 22
  • 150
  • 216
  • In Oracle as well, all three variations (plus SELECT NULL) use the exact same execution plan. Notably, even in the SELECT * plan there was no access to tableB, only to the index on the join column, so clearly the optimizer recognizes that it does not need the actual values for the SELECT *. – Dave Costa Jan 08 '09 at 15:16
  • 1
    * = nothing? I don't get it. In fact * = everything, NULL = Nothing. If you want to "not [imply] that I want something in particular returned" Would SELECT NULL be much, much clearer. –  Jan 08 '09 at 19:30
  • @Mark, Inside of an Where Exists SubQuery, The contents of a select clause are irrelevant... So not only does * = nothing, it also = each thing, anything, and everything and/or something ... ie. it doesn't matter... The reason you have to put "something" there is that a Select clause requires it. – Charles Bretana Jan 08 '09 at 19:40
1

The EXISTS returns a boolean not actual data, that said best practice is to use #3.

shA.t
  • 16,580
  • 5
  • 54
  • 111
SQLMenace
  • 132,095
  • 25
  • 206
  • 225
  • Can you explain why you consider that to be best practice? – Dave Costa Jan 08 '09 at 14:55
  • @Dave - I believe the original thinking behind the 'if exists(select 1 ..)' school of thought was that by returning a constant, SQL would be given more freedom to choose its query plan (e.g. from a narrower NC index), whereas a SELECT * seems to encourage a clustered index seek / scan. However, as OMG Ponies has pointed out, that since SQL doesnt' evaluate the clause anyway, it becomes a matter of readability / uniformity, not one of performance. Unfortunately, given that SELECT * is generally an anti pattern in selects, an exception like this makes coding standards a little harder to enforce. – StuartLC Oct 10 '11 at 12:38
1

This is one of those questions that verges on initiating some kind of holy war.

There's a fairly good discussion about it here.

I think the answer is probably to use the third option, but the speed increase is so infinitesimal it's really not worth worrying about. It's easily the kind of query that SQL Server can optimise internally anyway, so you may find that all options are equivalent.

inferis
  • 1,303
  • 1
  • 9
  • 15
0

In addition to what others have said, the practice of using SELECT 1 originated on old Microsoft SQL Server (prior 2005) - its query optimizer wasn't clever enough to avoid physically fetching fields from the table for SELECT *. No other DBMS, to my knowledge, has this deficiency.

The EXISTS tests for existence of rows, not what's in them, so other than some optimizer quirk similar to above, it doesn't really matter what's in the SELECT list.

The SELECT * seems to be most usual, but others are acceptable as well.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
0

Execution Plan. Learn it, use it, love it

There is no possible way to guess, really.

Thuglife
  • 394
  • 2
  • 3
-4

#3 Should be the best one, as you don´t need the returned data anyway. Bringing the fields will only add an extra overhead

  • 1
    I would hope that any modern database query optimizer would recognize that no data is being used from the subquery and treat all the variants identically. – Dave Costa Jan 08 '09 at 14:54