23

The overwhelming majority of people support my own view that there is no difference between the following statements:

SELECT * FROM tableA WHERE EXISTS (SELECT * FROM tableB WHERE tableA.x = tableB.y)
SELECT * FROM tableA WHERE EXISTS (SELECT y FROM tableB WHERE tableA.x = tableB.y)
SELECT * FROM tableA WHERE EXISTS (SELECT 1 FROM tableB WHERE tableA.x = tableB.y)
SELECT * FROM tableA WHERE EXISTS (SELECT NULL FROM tableB WHERE tableA.x = tableB.y)

Yet today I came face-to-face with the opposite claim when in our internal developer meeting it was advocated that select 1 is the way to go and select * selects all the (unnecessary) data, hence hurting performance.

I seem to remember that there was some old version of Oracle or something where this was true, but I cannot find references to that. So, I'm curious - how was this practice born? Where did this myth originate from?

Added: Since some people insist on having evidence that this is indeed a false belief, here - a google query which shows plenty of people saying it so. If you're too lazy, check this direct link where one guy even compares execution plans to find that they are equivalent.

Community
  • 1
  • 1
Vilx-
  • 104,512
  • 87
  • 279
  • 422
  • 2
    You can add `EXISTS (SELECT NULL FROM ...)` to your list :) – ypercubeᵀᴹ May 26 '11 at 11:05
  • 7
    Tell the people from your internal developer meeting that StackOverflow would like to challenge them to a duel. – codeulike May 26 '11 at 11:07
  • 1
    @codeulike - Whoever said that this was a duel? I'm simply curious where this comes from, because it's not the first time I've heard this argument. – Vilx- May 26 '11 at 11:27
  • @S.Lott - What facts? That this is a myth? Honestly, the web's full of facts about that already! What good will it do if I include a few more links here? – Vilx- May 26 '11 at 11:29
  • @ypercube - done and done! :) – Vilx- May 26 '11 at 11:30
  • @Vilx-, I would vote to not use `WHERE EXISTS (SELECT * FROM...` because it confuses people and makes your intent less clear, whereas `WHERE EXISTS (SELECT 1 FROM...` is clear to everyone. Equally `WHERE EXISTS (SELECT null FROM...` is just being clever whilst confusing people. Remember you are writing code primarily for people, not the SQL machine. – Johan May 26 '11 at 11:42
  • @Johan - to me all forms are equally readable. In fact, I prefer the `*` syntax because (IMHO) it more clearly states that you don't care about what comes back there. – Vilx- May 26 '11 at 11:54
  • @S.Lott - Facts aren't that hard to come by. In fact I almost convinced them already, though it was left at "Well 1 is more on the safe side" And really - it's not such a killer problem anyway. The performance is identical in either case and readability too. I don't see the point of making a religious battle out of this. – Vilx- May 26 '11 at 11:58
  • @Vilx: There's no religion. Please **update** the question with the actual timing facts. It helps a lot to show what you've learned about this silliness. – S.Lott May 26 '11 at 12:34
  • @S.Lott - I don't understand. What are you getting at? What would these measures prove? [That even their execution plans are indentical](http://bradsruminations.blogspot.com/2009/09/age-old-select-vs-select-1-debate.html)? How does this help me in tracing the source of this legend? Or maybe you wish to imply that it isn't a legend; that it's the truth? In your deleted answer you stated otherwise. – Vilx- May 26 '11 at 12:52
  • @S.Lott - there, do you believe me now? Really, I don't see how this is helping either me or anybody who might come later. – Vilx- May 26 '11 at 13:44
  • 1
    @Vilx: I already knew this to be true. I have struggled with the same dumb claim that `SELECT *` is slower. I am simply begging for a complete coverage of the issue so I wouldn't have to address the silliness ever again. I simply want the question to include the facts so I don't have to explain to people that `SELECT *` is conventional and has never had a performance impact. Having everything in the question helps all of us who are forced to have this silly conversation. – S.Lott May 26 '11 at 13:50

4 Answers4

23

The main part of your question is - "where did this myth come from?"

So to answer that, I guess one of the first performance hints people learn with sql is that select * is inefficient in most situations. The fact that it isn't inefficient in this specific situation is hence somewhat counter intuitive. So its not surprising that people are skeptical about it. But some simple research or experiments should be enough to banish most myths. Although human history kinda shows that myths are quite hard to banish.

codeulike
  • 22,514
  • 29
  • 120
  • 167
  • 4
    Upvote from me for being the first person to notice my true question! And a plausible explanation too! – Vilx- May 26 '11 at 11:34
14

As a demo, try these

SELECT * FROM tableA WHERE EXISTS (SELECT 1/0 FROM tableB WHERE tableA.x = tableB.y)
SELECT * FROM tableA WHERE EXISTS (SELECT CAST('bollocks' as int) FROM tableB WHERE tableA.x = tableB.y)

Now read the ANSI standard. ANSI-92, page 191, case 3a

If the <select list> "*" is simply contained in a <subquery>
          that is immediately contained in an <exists predicate>, then
          the <select list> is equivalent to a <value expression> that
          is an arbitrary <literal>.

Finally, the behaviour on most RDBMS should ignore THE * in the EXISTS clause. As per this question yesterday ( Sql Server 2005 - Insert if not exists ) this doesn't work on SQL Server 2000 but I know it does on SQL Server 2005+

Community
  • 1
  • 1
gbn
  • 422,506
  • 82
  • 585
  • 676
4

For SQL Server Conor Cunningham from the Query Optimiser team explains why he typically uses SELECT 1

The QP will take and expand all *'s early in the pipeline and bind them to objects (in this case, the list of columns). It will then remove unneeded columns due to the nature of the query.

So for a simple EXISTS subquery like this:

SELECT col1 FROM MyTable WHERE EXISTS (SELECT * FROM Table2 WHERE MyTable.col1=Table2.col2)The * will be expanded to some potentially big column list and then it will be determined that the semantics of the EXISTS does not require any of those columns, so basically all of them can be removed.

"SELECT 1" will avoid having to examine any unneeded metadata for that table during query compilation.

However, at runtime the two forms of the query will be identical and will have identical runtimes.

Edit: However I have looked at this in some detail since posting this answer and come to the conclusion that SELECT 1 does not avoid this column expansion. Full details here.

Community
  • 1
  • 1
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • I wonder if this holds true (for SQL Server 2008+) based in the date of that article – gbn May 26 '11 at 11:10
  • Well, this is one interesting source, although I don't think that it was the one that spawned this particular myth. As I said - I think there was an old RDBMS in which it was true, but I don't know which one. – Vilx- May 26 '11 at 11:33
  • @Vilx - But it could be argued that your supposition that it is a myth is itself not correct if there is a performance benefit in at least one RDBMS. (Though in practical terms I would imagine any such benefit as to be tiny in this case) – Martin Smith May 26 '11 at 11:36
  • Indeed, the difference is so obviously negligible that I cannot imagine it spawning this myth. And it was posted in 2008. I'm certain that I've heard this before that. – Vilx- May 26 '11 at 11:38
  • @Vilx - Agreed. I wonder if some prehistoric query optimisers did have issues with `*` here but that would have been before my time. – Martin Smith May 26 '11 at 11:43
  • @gbn - Against SQL Server 2008 in the end [I was not able to measure any performance difference that would bear this out](http://stackoverflow.com/questions/1597442/subquery-using-exists-1-or-exists/6140367#6140367) – Martin Smith May 26 '11 at 14:39
  • Thanks As you no doubt recall I've been caught out recently by change in behaviour :-) I suspect it has been optimised out. FYI I know Frank very well (comment 2) and we've been arguing for years now about this ... – gbn May 26 '11 at 18:49
  • For almost all practical purposes there is no difference between * and something else, and I would guess that this is a reason why changing the behaviour is really not high up in the priority list of thing to do for MS. I would need to dig through some archives, but I can't recall anyone ever mentioning even the plan to look into it. There are more important things to take care of. ... btw, if you look at more recent code, you've seen that I started to use SELECT *. :-) – Frank Kalis May 27 '11 at 08:26
  • @FrankKalis - I tried attaching a debugger while continually recompiling a query using ``SELECT 1`` and can see from the call stack that for a table with many columns it spends a fair amount of time in the `CAlgTableMetadata::LoadColumns` section (presumably expanding out the column metadata) – Martin Smith Nov 14 '11 at 09:13
  • I believe we've just dealt with a gnarly deadlock issue caused by a column being explicitly specified in the SELECT clause used inside an EXISTS subquery; when we replaced "SELECT TOP 1 iAirSegmentID" with "SELECT 1", the deadlocks went away. So the presence of the column, although it shouldn't have mattered, caused a Key Lock to be placed on the used index (because the column wasn't included in the scope of the index). – ALEXintlsos Jul 02 '12 at 21:32
  • 2
    @ALEXintlsos I know it's been several months, but I would love to see a repro of that, or even just the .xdl file... – Aaron Bertrand Apr 17 '13 at 13:00
0

This question has an answer that says it was some version of MS Access that actually did not ignore the field of the SELECT clause. I have done some Access development, and I have heard that SELECT 1 is best practice, so this seems very likely to me to be the source of the "myth."

Performance of SQL EXISTS usage variants

Community
  • 1
  • 1
Darren Griffith
  • 3,290
  • 3
  • 28
  • 35