SQL Server NOT EXISTS efficiency

Question

I have two tables Card and History with a one-to-many relationship: one card can have one or many histories. Card has a CardId column, and History has a CardId and a StatusId column.

I want a SQL script which selects the only cards which have no one history with StatusId=310.

This is what I've tried.

SELECT 
    C.CardId 
FROM 
    Card C
WHERE NOT EXISTS (SELECT * 
                  FROM History H
                  WHERE H.CardId = C.CardId AND H.StatusId = 310)

But I want to know if there is an efficient way.

Thanks in advance.

Why would you want to use `GROUP BY` for it? `WHERE NOT EXISTS` would be the most logical thing to use. — Siyual, Dec 07 '16 at 16:50
I would write it the same (I've read unsubstantiated claims that `select null` in a `where [not] exists` is slightly more performant). If this query doesn't perform like you'd like, look at the execution plan. — HoneyBadger, Dec 07 '16 at 16:55
Yes exists is an efficient way. It performs an anti-join, which basically means that it will stop scanning the history table as soon as a matching record is found, rather than doing any further needless work. This is the most efficient way of doing it. — GarethD, Dec 07 '16 at 16:57
According to your comment to an answer below your query runs in 1600 ms. How performant do you need it? Are you sure you are looking for performance increase in the right area? Under normal circumstances I'd be perfectly happy with 1600 ms. — HoneyBadger, Dec 07 '16 at 17:31
@HoneyBadger if you check out Aaron Bertrand's article in my answer below, he says SQL Server doesn't care what column(s) you use inside EXISTS and optimizes them away. I've read those same claims though, and I end up using select 1 or select null just in case. — SqlZim, Dec 07 '16 at 17:48
@SqlZim good link, thanks. I've suspected as much. I still always use NULL, just for clarity if nothing else. — HoneyBadger, Dec 07 '16 at 17:50

SqlZim · Accepted Answer · 2017-01-27T14:01:57.067

To answer your question about whether there is a more efficient way:

Short answer: No, not exists is most likely the most efficient method.

Long answer: Aaron Bertrand's article with benchmarks for many different methods of doing the same thing Should I use not in, outer apply, left outer join, except, or not exists? Spoiler: not exists wins.

I would stick with your original code using not exists, but Aaron's article has many examples you can adapt to your situation to confirm that nothing else is more efficient.

select c.CardId 
  from Card c
  where not exists(
    select 1
      from History h
      where h.CardId=c.CardId 
        and h.StatusId=310
             )

Thanks for your answer. – Mihai Alexandru-Ionut Dec 07 '16 at 17:52 — Mihai Alexandru-Ionut, Dec 07 '16 at 17:52

score 3 · Answer 2 · answered Dec 07 '16 at 17:07

3

What you're after is called a "negative JOIN". Here's a version without the subquery:

SELECT c.CardId
FROM Card c
LEFT OUTER JOIN History h
  ON h.CardId = c.CardId
  AND h.StatusId = 310
WHERE h.CardId IS NULL;

answered Dec 07 '16 at 17:07

dmfay

2,417
1
11
22

score 1 · Answer 3 · answered Dec 07 '16 at 16:51

1

Your NOT EXISTS method looks reasonable but there is another approach.

SELECT *
FROM   Card C
WHERE  EXISTS (SELECT 1
               FROM   history H
               WHERE  H.CardId = C.CardId
               HAVING Count(CASE WHEN H.StatusId = 310 THEN 1 END) = 0)

Check the performance by running both the queries with your real time data

answered Dec 07 '16 at 16:51

Pரதீப்

91,748
19
131
172

It seems that `NOT EXISTS` is faster. `1600` vs `2100` milliseconds.Thanks for your approach. – Mihai Alexandru-Ionut Dec 07 '16 at 17:08

SQL Server NOT EXISTS efficiency

3 Answers3