80

Below is the code sample:

CREATE TABLE #titles(
    title_id       varchar(20),
    title          varchar(80)       NOT NULL,
    type           char(12)          NOT NULL,
    pub_id         char(4)               NULL,
    price          money                 NULL,
    advance        money                 NULL,
    royalty        int                   NULL,
    ytd_sales      int                   NULL,
    notes          varchar(200)          NULL,
    pubdate        datetime          NOT NULL
 )
 GO

 insert #titles values ('1', 'Secrets',   'popular_comp', '1389', $20.00, $8000.00, 10, 4095,'Note 1','06/12/94')
 insert #titles values ('2', 'The',       'business',     '1389', $19.99, $5000.00, 10, 4095,'Note 2','06/12/91')
 insert #titles values ('3', 'Emotional', 'psychology',   '0736', $7.99,  $4000.00, 10, 3336,'Note 3','06/12/91')
 insert #titles values ('4', 'Prolonged', 'psychology',   '0736', $19.99, $2000.00, 10, 4072,'Note 4','06/12/91')
 insert #titles values ('5', 'With',      'business',     '1389', $11.95, $5000.00, 10, 3876,'Note 5','06/09/91')
 insert #titles values ('6', 'Valley',    'mod_cook',     '0877', $19.99, $0.00,    12, 2032,'Note 6','06/09/91')
 insert #titles values ('7', 'Any?',      'trad_cook',    '0877', $14.99, $8000.00, 10, 4095,'Note 7','06/12/91')
 insert #titles values ('8', 'Fifty',     'trad_cook',    '0877', $11.95, $4000.00, 14, 1509,'Note 8','06/12/91')
 GO


CREATE TABLE #sales(
    stor_id        char(4)           NOT NULL,
    ord_num        varchar(20)       NOT NULL,
    ord_date       datetime          NOT NULL,
    qty            smallint          NOT NULL,
    payterms       varchar(12)       NOT NULL,
    title_id       varchar(80)
)
 GO
insert #sales values('1', 'QA7442.3', '09/13/94', 75, 'ON Billing','1')
insert #sales values('2', 'D4482',    '09/14/94', 10, 'Net 60',    '1')
insert #sales values('3', 'N914008',  '09/14/94', 20, 'Net 30',    '2')
insert #sales values('4', 'N914014',  '09/14/94', 25, 'Net 30',    '3')
insert #sales values('5', '423LL922', '09/14/94', 15, 'ON Billing','3')
insert #sales values('6', '423LL930', '09/14/94', 10, 'ON Billing','2')


SELECT    title, price
FROM      #titles
WHERE     EXISTS
(SELECT   *
FROM      #sales
WHERE     #sales.title_id = #titles.title_id
AND       qty >30)


SELECT    t.title, t.price
FROM     #titles t
inner join #sales s on t.title_id = s.title_id
where s.qty >30 

I want to know what is the difference between the above 2 queries which gives the same result.Also want to know the purpose of EXISTS keyword and where exactly to use?

thor
  • 21,418
  • 31
  • 87
  • 173
satyajit
  • 2,540
  • 11
  • 33
  • 44
  • 3
    I would recommend using `SELECT NULL` for your exists query, since there's no need to actually take the time to pull values into a result set. I don't know if sql-server allows `SELECT NULL` - if it doesn't you could probably `SELECT 1` or something equally small. – corsiKa Aug 16 '11 at 17:43
  • 7
    @Glowcoder - there's no difference between `SELECT NULL` `SELECT 1` or `SELECT <20 fields>` in SQL Server. It just checks the tables and filters. – JNK Aug 16 '11 at 17:44
  • 2
    Please note that `EXISTS` with an outer reference **is** a join, not just a *clause*. It is a `semi-join` (and `NOT EXISTS` is an `anti-semi-join`). Also, you can use EXISTS to join tables, one example being `Customer C JOIN OrderCategory OC ON EXISTS (SELECT 1 FROM Order O WHERE C.CustomerID = O.CustomerID AND OC.OrderCategoryID = O.OrdercategoryID)`. – ErikE Aug 16 '11 at 17:46
  • @JNK that might be true, I don't know SQL Server. It can have an effect in other SQL flavors, and is a general good practice, so I thought I'd toss it along. – corsiKa Aug 16 '11 at 17:47
  • @Glowcoder, JNK is correct. I use `SELECT 1` out of habit, to self-document that this does not return any data, but the optimizer truly doesn't care what you put there. – Aaron Bertrand Aug 16 '11 at 17:48
  • @glowcoder : @JNK is correct, the performance gain of `SELECT ` is absolutely tiny, unimportant, and largely theoretical. Also, in Oracle, the myth about `SELECT 1` being better was actually wrong (it was materially slower than `SELECT *`) so they had to put special coding in later Oracle versions to detect the use of `SELECT 1` and recover proper performance. It's funny how prevalent "programming folk tales" are. – ErikE Aug 16 '11 at 17:48
  • 1
    @ErikE you're not using `EXISTS` to join tables, you're merely using `EXISTS` as the criteria to join tables, which is no different than how it works in `WHERE` – corsiKa Aug 16 '11 at 17:48
  • 2
    @glowcoder Can you rewrite my example query (joining two tables through an intermediate many-to-many table) without EXISTS, that performs as well or better? How would it work in the WHERE clause (yes it's possible but would be far less clear, looking like old-style non-ANSI joins, but I'd like to see your answer). `EXISTS` with an outer reference **is a join**. Look at the execution plan. – ErikE Aug 16 '11 at 17:51
  • @ErikE All I'm saying is that the decision to include a row using `EXISTS` on a `JOIN..ON` statement is the same decision to include it on a `WHERE` statement - there's nothing special about it. – corsiKa Aug 16 '11 at 17:57
  • @JNK According to Conor Cunningham, Principal Software Architect for SQL Server Engine, "The * will be expanded to some potentially big column list and then it will be determined that the semantics of the EXISTS does not require any of those columns, so basically all of them can be removed. "SELECT 1" will avoid having to examine any unneeded metadata for that table during query compilation. However, at runtime the two forms of the query will be identical and will have identical runtimes. **As such, I typically use SELECT 1.**" – corsiKa Aug 16 '11 at 18:06
  • [source from above, lack of space](http://web.archive.org/web/20080915043540/http://www.sqlskills.com/blogs/conor/2008/02/07/EXISTSSubqueriesSELECT1VsSELECT.aspx) So, the team lead on SQL Server Engine's recommends using it. I also recommend using it. – corsiKa Aug 16 '11 at 18:06
  • @glowcoder You didn't answer my question. I'll do it for you: `FROM Customer C CROSS JOIN OrderCategory OC WHERE EXISTS (SELECT 1 FROM Order O WHERE C.CustomerID = O.CustomerID AND OC.OrderCategoryID = O.OrdercategoryID)`. Whether it's in the WHERE clause or not, it's special because it is a **semi-join**. Could you acknowledge this please? Have you actually looked at an execution plan as I suggested? – ErikE Aug 16 '11 at 18:06
  • @glowcoder P.S. Yes, we agree that there is some tiny potential overhead of using `EXISTS(SELECT *)` (I use `SELECT 1` myself). Yet, the Microsoft SQL Server documentation says this: "The select list of a subquery introduced with EXISTS, by convention, has an asterisk (*) instead of a single column name. The rules for a subquery introduced with EXISTS are the same as those for a standard select list, because a subquery introduced with EXISTS creates an existence test and returns TRUE or FALSE, instead of data." (T-SQL reference, Subquery Rules). – ErikE Aug 16 '11 at 18:12
  • @glowcoder - Far be it from me to say Conor is wrong but [I looked into this here](http://stackoverflow.com/questions/1597442/subquery-using-exists-1-or-exists/6140367#6140367) and came to the conclusion that the expansion still happens for `1` – Martin Smith Sep 25 '11 at 11:21
  • @ErikE - BTW Just realised that in the previous version of that answer I had mentioned my findings RE: column count without actually giving any results demonstrating this aspect so I have redone it with additional results. – Martin Smith Oct 02 '11 at 11:33

4 Answers4

167

EXISTS is used to return a boolean value, JOIN returns a whole other table

EXISTS is only used to test if a subquery returns results, and short circuits as soon as it does. JOIN is used to extend a result set by combining it with additional fields from another table to which there is a relation.

In your example, the queries are semantically equivalent.

In general, use EXISTS when:

  • You don't need to return data from the related table
  • You have dupes in the related table (JOIN can cause duplicate rows if values are repeated)
  • You want to check existence (use instead of LEFT OUTER JOIN...NULL condition)

If you have proper indexes, most of the time the EXISTS will perform identically to the JOIN. The exception is on very complicated subqueries, where it is normally quicker to use EXISTS.

If your JOIN key is not indexed, it may be quicker to use EXISTS but you will need to test for your specific circumstance.

JOIN syntax is easier to read and clearer normally as well.

Stuck
  • 11,225
  • 11
  • 59
  • 104
JNK
  • 63,321
  • 15
  • 122
  • 138
  • Very useful answer, though I wonder if it's worth mentioning `GROUP BY`? One of the big drawbacks of a `JOIN` is when it multiplies the results and requires grouping to reduce them again; if you can replace with `EXISTS` and there are suitable keys this will often be much faster than grouping if you don't actually need any aggregate functions (`AVG`, `SUM` etc.). – Haravikk Apr 13 '22 at 15:41
49
  • EXISTS is a semi-join
  • JOIN is a join

So with 3 rows and 5 rows matching

  • JOIN gives 15 rows
  • EXISTS gives 3 rows

The result is the "short circuit" effect mentioned by others and no need to use DISTINCT with a JOIN. EXISTS is almost always quicker when looking for existence of rows on the n side of a 1:n relationship.

Luke Girvin
  • 13,221
  • 9
  • 64
  • 84
gbn
  • 422,506
  • 82
  • 585
  • 676
19

EXISTS is primarily used to shortcut. Essentially the optimizer will bail out as soon as the condition is true, so it may not need to scan the entire table (in modern versions of SQL Server this optimization can occur for IN() as well, though this was not always true). This behavior can vary from query to query, and in some cases the join may actually give the optimizer more opportunity to do its job. So I think it's hard to say "this is when you should use EXISTS, and this is when you shouldn't" because, like a lot of things, "it depends."

That said, in this case, since you have essentially a 1:1 match between the tables, you are unlikely to see any performance difference and the optimizer will likely produce a similar or even identical plan. You may see something different if you compare join/exists on the sales table when you add 50,000 rows for each title (never mind that you will need to change your join query to remove duplicates, aggregate, what have you).

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • 4
    +1 for "It depends". I have been doing a lot of code tweaking recently and replacing `JOIN` conditions with `EXISTS` based on my belief it would be faster. When I actually tested they are identical in a majority of cases for me (I think since the `JOIN` were always on the cluster key) – JNK Aug 16 '11 at 17:56
6

I find exists to be most useful when I have rows I would like to exclude based on how they interact with other rows.

For example,

SELECT * 
  FROM TABLE a
 WHERE a.val IN (1,2,3)
   AND NOT EXISTS(SELECT NULL
                    FROM TABLE b
                   WHERE b.id = a.id
                     AND b.val NOT IN (1, 2, 3))

In this case, I'm excluding a row in my a query based on having a b record with the same id but being invalid.

This actually came from a production problem I had at work. The query moved most of the exclusion logic in the query instead of in the application, taking load time from over 24 seconds to under 2 seconds. :-)

Community
  • 1
  • 1
corsiKa
  • 81,495
  • 25
  • 153
  • 204