58

If I want to find a set of entries in table A but not in table B, I can use either LEFT OUTER JOIN or NOT EXISTS. I've heard SQL Server is geared towards ANSI and in some case LEFT OUTER JOINs are far more efficient than NOT EXISTS. Will ANSI JOIN perform better in this case? and are join operators more efficient than NOT EXISTS in general on SQL Server?

kefeizhou
  • 6,234
  • 10
  • 42
  • 55

4 Answers4

76

Joe's link is a good starting point. Quassnoi covers this too.

In general, if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXIST in the subquery) NOT EXISTS will perform better.

EXISTS and NOT EXISTS both short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.

LEFT JOIN will join ALL RECORDS regardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOIN criteria, this can be very very resource intensive.

I normally try to use NOT EXISTS and EXISTS where possible. For SQL Server, IN and NOT IN are semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.

JNK
  • 63,321
  • 15
  • 122
  • 138
  • 4
    EXCEPT and INTERSECT are the same as (NOT) EXISTS too. And NOT IN fails with NULLs of course... – gbn Jul 21 '11 at 14:54
  • @gbn - didn't realize that. I normally only use those when a lot of fields are involved and it's harder to decipher in an `EXISTS` clause – JNK Jul 21 '11 at 14:55
  • Avoid `NOT IN`. At least when writing code or using an ORM, the number of elements in the "In list" is limited. See also: http://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join. – Jess Jul 17 '15 at 18:19
  • 1
    i Just have one concern. If i have 10000 matching record and use not exist in sub query. Dosen't this means my sub query will run 10000 time (i.e. for each record) and degrade the performance, on other left join will bring all the record at once and then filter 10000 matching record and give better performance – Avinash Jain Nov 20 '15 at 10:36
  • 2
    **this answer is not correct anymore**. In SQL server 2017 (and later (and maybe even earlier)) both can shortcircuit. In most cases query optimizer correctly transforms `left join where is null` to `anti semi join`, just as it does with `not exists`. Sometimes each operators produces different plan. Sometimes one is faster than the other. **It depends.** – andowero May 30 '20 at 12:50
  • @andowero Please back that up, I get a left join plan on [SQLfiddle which is 2017](http://sqlfiddle.com/#!18/9eecb/113717/0) – Charlieface Mar 01 '21 at 20:55
  • @Charlieface As I have written - _It depends_. In your SQLfiddle the plan produced seems to be a "trivial plan" for that particular select, which the SQL Server deemed "good enough" since both table variables are assumed to contain 1 row (and they really do). As one SQL server senior programmer once said: "SQL Server has good optimizer, so two different queries producing identical datasets _tend to_ (emphasis on _tend_) have identical execution plans" – andowero Mar 03 '21 at 06:10
  • When I have `Top (100)` in the query, the difference is significant. the `Not Exists` vs `Left Join` is `7%` vs `93%`. I have 3M records in table A, 4M records in table B. The queries are like `select top (100) A.* from A where not exists (select 1 from B where A.c = B.c)` and `select top (100) A.* from A left join B on A.c = B.c where B.c = null` – Eben Oct 20 '22 at 10:13
2

Answer on dba.stackexchange

An exception I've noticed to the NOT EXISTS being superior (however marginally) to LEFT JOIN ... WHERE IS NULL is when using Linked Servers.

From examining the execution plans, it appears that NOT EXISTS operator gets executed in a nested loop fashion. Whereby it is executed on a per row basis (which I suppose makes sense).

Example execution plan demonstrating this behaviour: enter image description here

pim
  • 12,019
  • 6
  • 66
  • 69
2

Personally, I think that this one gets a big old, "It Depends". I've seen instances where each method has outperformed the other.

Your best bet is to test both and see which performs better. If it's a situation where the tables will always be small and performance isn't as crucial then I'd just go with whichever is the clearest to you (that's usually NOT EXISTS for most people) and move on.

Tom H
  • 46,766
  • 14
  • 87
  • 128
  • It really depends, I just had two rewrite a query that was using not exists, and replaced `not exists` with `left outer join with null check`, yes it did perform much better. But always go for Not Exists, most of the time it will perform much better,and the intent is clearer when using `Not Exists`. – hazimdikenli Apr 22 '15 at 06:58
0

This blog entry gives examples of various ways ( NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT and NOT EXISTS ) to achieve same results and proves that Not Exists ( Left Anti Semi Join) is the best options in both cold cache and warm cache scenarios.

N30
  • 3,463
  • 4
  • 34
  • 43