SQL Server Performance: LEFT JOIN vs NOT IN

Question

The output of these two queries is the same. Just wondering if there's a performance difference between these 2 queries, if the requirement is to list down all the customers who have not purchased anything? The database used is the Northwind sample database. I'm using T-SQL.

select companyname 
from Customers c
left join Orders o on c.customerid = o.customerid
where o.OrderID is null

select companyname 
from Customers c
where Customerid not in (select customerid from orders)

Which [DBMS](https://en.wikipedia.org/wiki/DBMS) product are you using? "SQL" is just a query language, not the name of a specific database product. Please add a [tag](https://stackoverflow.com/help/tagging) for the database product you are using `postgresql`, `oracle`, `sql-server`, `db2`, ... — , Mar 12 '19 at 15:01
You should tag your RDBMS as performance will be optimiser dependant. I assume likely SQL Server from Northwind? I would use `NOT EXISTS` generally — Martin Smith, Mar 12 '19 at 15:01
There's no way of answering your question as it is asked now. Even though your question is fair, it's fairly incomplete. Databases do not exist in the void. You need to specify which database engine the database is running on. Once you say that, then get the execution plan for both queries. Some databases optimizers are smarter than others and may produce better optimizations for each case. But that ultimately depends on the specific engine and version. — The Impaler, Mar 12 '19 at 15:08
Possible duplicate of [What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?](https://stackoverflow.com/questions/2246772/whats-the-difference-between-not-exists-vs-not-in-vs-left-join-where-is-null) — Heinzi, Mar 12 '19 at 15:21

score 2 · Answer 1 · answered Mar 12 '19 at 15:14

If you want to find out empirically, I'd try them on a database with one customer who placed 1,000,000 orders.

And even then you should definitely keep in mind that the results you'll be seeing are valid only for the particular optimiser you're using (comes with particular version of particular DBMS) and for the particular physical design you're using (different sets of indexes or different detailed properties of some index might yield different performance characteristics).

score 1 · Accepted Answer · answered Mar 12 '19 at 15:28

Potentially the second is faster if the tables are indexed. So if orders has an index on customer ID, then NOT IN will mean that you aren't bringing back the entire ORDERS table.

But as Erwin said, a lot depends on how things are set up. I'd tend to go for the second option as I don't like bringing in tables unless I need data from them.

SQL Server Performance: LEFT JOIN vs NOT IN

2 Answers2