170

It seems to me that you can do the same thing in a SQL query using either NOT EXISTS, NOT IN, or LEFT JOIN WHERE IS NULL. For example:

SELECT a FROM table1 WHERE a NOT IN (SELECT a FROM table2)

SELECT a FROM table1 WHERE NOT EXISTS (SELECT * FROM table2 WHERE table1.a = table2.a)

SELECT a FROM table1 LEFT JOIN table2 ON table1.a = table2.a WHERE table1.a IS NULL

I'm not sure if I got all the syntax correct, but these are the general techniques I've seen. Why would I choose to use one over the other? Does performance differ...? Which one of these is the fastest / most efficient? (If it depends on implementation, when would I use each one?)

froadie
  • 79,995
  • 75
  • 166
  • 235
  • 7
    Many common SQL engines give you the ability to see an execution plan. You can often spot significant differences in efficiency for logically equivalent queries in this way. The success of any method depends on factors such as table size, what indexes are present, and others. – Chris Farmer Feb 11 '10 at 18:46
  • For the NOT EXISTS you should use `SELECT a FROM table1 WHERE NOT EXISTS (SELECT NULL FROM table2 WHERE table1.a = table2.a)` so that no result set need to be created for the sub query. – wich Feb 11 '10 at 18:52
  • 3
    @wich: no database cares about what exactly you return inside the `EXISTS` clause. You may return `*`, `NULL` or whatever: all this will be optimized away. – Quassnoi Feb 11 '10 at 18:57
  • 2
    @wich - why? Both here: http://www.techonthenet.com/sql/exists.php and here: http://msdn.microsoft.com/en-us/library/ms188336.aspx seem to use *... – froadie Feb 11 '10 at 18:57
  • Quassnoi, froadie, Quassnoi you are right the DBMS can optimize it away, but why express an interest in a result if you have none? Be explicit and say that you don't care, make it easy for the execution planner and don't rely on its optimization when you don't need to. – wich Feb 12 '10 at 07:35
  • 9
    @wich: this is not about "expressing interest". This is about the query parser demands you to put something between `SELECT` and `FROM`. And `*` is just easier to type. Yes, `SQL` does bear some resemblance to a natural language, but it is parsed and executed by a machine, a programmed machine. It's not that it will ever suddenly break into your cubicle and shout "stop demanding for the extra fields in an `EXISTS` query because I'm f**g sick of parsing them and then throwing them off!". It's OK with a computer, really. – Quassnoi Feb 12 '10 at 17:23
  • 1
    @Quassnoi if you wrote code for the sole purpose of a machine interpreting it the code would look horrible, and unfortunately quite a few people work like that. If however you write code in another optic, writing code to express what you want the machine to do as a communiqué to your peers you will write better and more maintainable code. Be smart, write code for people, not for the computer. – wich Feb 12 '10 at 18:44
  • @wich: your point was "making it easy for the execution planner". My point is that it doesn't really matter: the execution planner can handle it both ways with equal ease. As for me, I personally use `SELECT NULL` in the `EXISTS` queries — not because it's "more efficient" (which is not) or "more readable" (which is not too), but for the much more important reason of being used to it. – Quassnoi Feb 12 '10 at 22:23
  • @Quassnoi - On rare occasions, I have run into databases which did something with the SELECT statement in an Exists clause. IIRC, old versions of Access would actually return a record before analyzing existence. Thus, SELECT * did not perform as well as say SELECT 1. In addition, I have run into databases (many moons ago) that would throw compiler errors when the EXISTS clause contained a GROUP BY and you used SELECT *. Myself, I use SELECT 1 because I've never run into a DB that didn't handle it correctly and it allows me to search and destroy uses of SELECT * in code. – Thomas Apr 28 '10 at 23:46
  • @Quassnoi - Select NULL would obviously serve the same purpose as SELECT 1. – Thomas Apr 28 '10 at 23:47
  • Just tried in my own case on my database in MS Access 2007 (T-SQL) - NOT IN - about 10 sec - NOT EXISTS - about 1.5 sec – baleks Apr 28 '10 at 16:28
  • Could you please share some of the particulars about the contents in the database? – NickO Nov 04 '12 at 06:01
  • Possible duplicate of [What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?](http://stackoverflow.com/questions/5706437/whats-the-difference-between-inner-join-left-join-right-join-and-full-join) – Ricardo Souza May 16 '16 at 23:50

5 Answers5

157

In a nutshell:

NOT IN is a little bit different: it never matches if there is but a single NULL in the list.

  • In MySQL, NOT EXISTS is a little bit less efficient

  • In SQL Server, LEFT JOIN / IS NULL is less efficient

  • In PostgreSQL, NOT IN is less efficient

  • In Oracle, all three methods are the same.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • 1
    Thanks for the links! And thanks for the quick overview... My office is blocking the link for some reason :P but I'll check it out as soon as I get to a regular computer. – froadie Feb 11 '10 at 18:47
  • 2
    Another point is that if `table1 .a` contains `NULL` the `EXISTS` query will not return this row but the `NOT IN` query will if `table2` is empty. [NOT IN vs. NOT EXISTS Nullable Columns: SQL Server](http://stackoverflow.com/questions/173041/not-in-vs-not-exists/11074428#11074428) – Martin Smith Jun 19 '12 at 07:29
  • @MartinSmith: `NULL NOT IN ()` evaluates to true (not `NULL`), just as `NOT EXISTS (NULL = column)` – Quassnoi Jun 19 '12 at 07:33
  • 2
    @Quassnoi - er, Good point, got that the wrong way round. The `NOT EXISTS` will always return the row but `NOT IN` will only do so if the sub query returns no rows. – Martin Smith Jun 19 '12 at 07:37
6

If the database is good at optimising the query, the two first will be transformed to something close to the third.

For simple situations like the ones in you question, there should be little or no difference, as they all will be executed as joins. In more complex queries, the database might not be able to make a join out of the not in and not exists queryes. In that case the queries will get a lot slower. On the other hand, a join may also perform badly if there is no index that can be used, so just because you use a join doesn't mean that you are safe. You would have to examine the execution plan of the query to tell if there may be any performance problems.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
3

Assuming you are avoiding nulls, they are all ways of writing an anti-join using Standard SQL.

An obvious omission is the equivalent using EXCEPT:

SELECT a FROM table1
EXCEPT
SELECT a FROM table2

Note in Oracle you need to use the MINUS operator (arguably a better name):

SELECT a FROM table1
MINUS
SELECT a FROM table2

Speaking of proprietary syntax, there may also be non-Standard equivalents worth investigating depending on the product you are using e.g. OUTER APPLY in SQL Server (something like):

SELECT t1.a
  FROM table1 t1
       OUTER APPLY 
       (
        SELECT t2.a
          FROM table2 t2
         WHERE t2.a = t1.a
       ) AS dt1
 WHERE dt1.a IS NULL;
onedaywhen
  • 55,269
  • 12
  • 100
  • 138
0

When need to insert data in table with multi-field primary key, consider that it will be much faster (I tried in Access but I think in any Database) not to check that "not exists records with 'such' values in table", - rather just insert into table, and excess records (by the key) will not be inserted twice.

j0k
  • 22,600
  • 28
  • 79
  • 90
baleks
  • 1
  • 1
0

Performance perspective always avoid using inverse keywords like NOT IN, NOT EXISTS, ... Because to check the inverse items DBMS need to runs through all the available and drop the inverse selection.

  • 1
    And what do you propose as workaround when you actually need `NOT`? – dnoeth Apr 06 '16 at 07:57
  • Well when there's no option of cause we need to use NOT operations and thats why they are exists. Best practise is it avoid them when we have any other alternative solutions. – Lahiru Cooray Jun 13 '16 at 03:47
  • @onedaywhen, if an optimizer transforms a query and it returns the wrong result *then it is a bug* – David דודו Markovitz Oct 09 '16 at 10:12
  • @DuduMarkovitz: yes and if you contact the SQL Server team and they acknowledge the bug but refuse to fix it because they say doing so may make queries run slower, then it it is a bug *you need to deal with*. – onedaywhen Oct 10 '16 at 07:27
  • @onedaywhen - This was not a hypothetical scenario I presume :-) Do you by any chance remember the bug details? – David דודו Markovitz Oct 10 '16 at 18:09
  • @DuduMarkovitz: No, sadly it's something the SQL Server team do on principle and in practice. I think it is specifically [this bug](https://connect.microsoft.com/SQLServer/feedback/details/350485/bug-with-newid-and-table-expressions) I ran into: 'we default to "optimize performance" - which is good for 99% of cases... This default to "optimize performance" again, is long-established, and accepted. (Yes, it's not the stance chosen by compilers for conventional programming languages, but so be it)... Hope this explanation helps clarify our reasons for closing this bug as "won't fix".' – onedaywhen Oct 11 '16 at 07:32
  • @onedaywhen, Thank you for the reference. I am familiar with this issue, and my honest opinion - this is not a bug. I claim this for the following reasons:(1) the SQL standard does not force a specific implementation for the WITH clause. It does not say if you have to execute the query once, save the results and reuse it if necessary **or** execute it again and again every time it is being referenced. Oracle optimizer for instance, is doing both. (2) **newid()**, by definition **MUST** return a different value every time it is being invoked. The only issue is that the CTE doc is not clear. – David דודו Markovitz Oct 11 '16 at 08:08
  • @onedaywhen. BTW, it is not even a CTE issue. How about this? **select case floor (rand()*10) when 0 then 0 when 0 then 1 when 0 then 2 when 0 then 3 when 0 then 4 when 0 then 5 when 0 then 6 when 0 then 7 when 0 then 8 when 0 then 9 else -1 end** But even here it is problematic to claim for a bug since nowhere it is guaranteed that the CASE expression will be evaluated only once. – David דודו Markovitz Oct 11 '16 at 08:37
  • @DuduMarkovitz Don't get bogged down in the details of the case but look at the precedence employed in sentencing, m'lud. There are plenty of SQL Server bugs out there if you don't think this one is worthy :) – onedaywhen Oct 11 '16 at 12:04
  • @LahiruCooray, I suggest you'll give it a 2nd thought. You are suggesting inefficient way to implement negation when there are other efficient ways to achieve the same results. – David דודו Markovitz Oct 11 '16 at 15:48