Fastest way to get records in one set but not another

Question

In SQL Server I have a query like this

with ds as (select ...)
select * 
from table
where id not in (select * from ds)

but the inner select statement and the outer one (from table) both are like 200k records, and this is taking too long to compute. Is there a more efficient way to find records in table, that are not from the nested select statement?

Use a `NOT EXISTS`? `NOT IN (SELECT *...`) won't work though; `IN` expects a set of *scalar* values to be returned. — Thom A, Apr 19 '22 at 17:14
You can use an OUTER JOIN or a NOT EXISTS. Those are generally more performant than an IN clause. Also make sure you have indexes on the columns you are joining/matching. — Igor, Apr 19 '22 at 17:15
Does this answer your question? [Find records from one table which don't exist in another](https://stackoverflow.com/questions/367863/find-records-from-one-table-which-dont-exist-in-another) — Igor, Apr 19 '22 at 17:16
[Example E: Using NOT EXISTS](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/exists-transact-sql?view=sql-server-ver15#e-using-not-exists) — Thom A, Apr 19 '22 at 17:17
You'll have to test it, because it depends. Please read right to the end (or at the very least read the conclusion paragraph): https://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join Also, see [Which is faster?](https://ericlippert.com/2012/12/17/performance-rant/) — Aaron Bertrand, Apr 19 '22 at 17:33
Stop guessing. Look at the execution plan to understand what the engine is doing and evaluate what you can do to improve it. — SMor, Apr 19 '22 at 17:58
For performance questions you need to tell us what indexes the tables have, and share the execution plan via https://brentozar.com/pastetheplan — Charlieface, Apr 20 '22 at 01:26
@Igor That is a complete myth: `EXISTS` and `IN` [perform the same](https://explainextended.com/2009/09/15/not-in-vs-not-exists-vs-left-join-is-null-sql-server/) (and usually better than `LEFT JOIN... IS NULL`) except that `NOT IN` can give incorrect results where nulls are involved. — Charlieface, Apr 20 '22 at 01:30
Good link @Charlieface, thank you. I did a little more reading on the matter and it seems that whether a IN/NOT IN is performant can also depend on if the column is nullable or not. See also https://stackoverflow.com/a/11074428/1260204. I retract what I said in my earlier comment about performance. — Igor, Apr 20 '22 at 12:30
@Igor That is only true in the case of `NOT IN` not `IN`, and in any case the issue of incorrect results is more overriding than its performance. The reason it is slower is that it has to do more work to get those incorrect results: the semantics of `NOT IN` are the same as `a <> b1 AND a <> b2` which is subtly different if there are nulls, whereas `IN` becomes `a = b1 OR a = b2` which behaves the same whether or not there are nulls. You can see this in the query plans — Charlieface, Apr 20 '22 at 13:35

mohsen mashhadi · Answer 1 · 2022-04-20T08:45:48.923

0

I think you can put the first query in "view" or "function" and then use them for other usages. example:

create view View1 ... usage: select * from View1

create function Function1 (@value int) ... usage: select * from Function1(20)

edited Apr 20 '22 at 08:45

answered Apr 20 '22 at 04:10

mohsen mashhadi

201
2
8

And how is this going to perform better? – Luuk Apr 20 '22 at 08:55
In this link, you can see the differences between View and CTE: https://stackoverflow.com/questions/30918633/sql-cte-vs-view – mohsen mashhadi Apr 21 '22 at 06:27

Fastest way to get records in one set but not another

1 Answers1