Modern alternative to DELETE ... WHERE ... NOT IN (...)?

Question

Taking the following two table variables, is there any (faster/shorter/more efficient) alternative to delete any data from @tbl_big that doesn't exist within @tbl_small? The reason is that I have a large selection of data, and based on optional parameters (in a procedure) I filter the data down further before presenting to the user.

The userId column is indexed in the physical table these table variable columns are populated from.

DECLARE @tbl_big TABLE (userID int);
INSERT INTO @tbl_big (userID) VALUES (1),(5),(10),(20),(30),(40),(60),(100);

DECLARE @tbl_small TABLE (userID int);
INSERT INTO @tbl_small (userID) VALUES (1),(5),(10),(20)

-- this deletes 30,40,60,100 from @tbl_big
DELETE FROM 
    @tbl_big 
WHERE
    (userID NOT IN (SELECT userID FROM @tbl_small));

SELECT * from @tbl_big;

http://www.sqlfiddle.com/#!6/d41d8/12674

I did wonder whether INTERSECT or EXCEPT might do this but can't figure them out.

Update: EXCEPT/INTERSECT findings included as an answer. I'm not sure if there's a shorter way from a coding perspective...?

score 2 · Accepted Answer · answered Dec 11 '13 at 12:56

2

Typically, exists or left outer join would do the trick.

DELETE b FROM @tbl_big b
WHERE NOT EXISTS (
    SELECT 1 FROM @tbl_small s
    WHERE s.userID = b.userID);

OR

DELETE b FROM @tbl_big b
LEFT OUTER JOIN @tbl_small s
    ON s.userID = b.userID
WHERE s.userID IS NULL;

answered Dec 11 '13 at 12:56

David

19,389
12
63
87

2

+1 for NOT EXISTS, -1 for LEFT OUTER JOIN. http://www.sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join – Aaron Bertrand Dec 11 '13 at 13:13
Amazing article. Why is `SELECT 1` used please instead of `SELECT *` as the article mentions? Also, what are the problems mentioned in the article that could arise from duplicates/NULLs? – EvilDr Dec 11 '13 at 13:31
1

Yeah, I'm a proponent of the NOT EXISTS. Sometimes it depends, you can find examples on the intertubes where LEFT OUTER was more performant over not exists, especially when nesting exists statements. The `SELECT 1` vs `SELECT *` I am of the impression that either are ok, i.e. performance is not affected. I think some people are just totally anti-`SELECT *` that they choose to `SELECT 1` instead. – David Dec 11 '13 at 13:42
Accepted this answer as the comments were extremely helpful. I'm not convinced that any alternative method is any better than the original, but it was interesting finding out. Thank you all. – EvilDr Dec 12 '13 at 07:56
1

And +1 to Dave for having the best moustache on StackOverflow! – EvilDr Dec 12 '13 at 07:56

score 1 · Answer 2 · answered Dec 11 '13 at 12:41

1

Well this is modern :) but i'm not sure it's faster than a delete where not exists :

DECLARE @tbl_big TABLE (userID int)
DECLARE @tbl_small TABLE (userID int)

INSERT INTO @tbl_big (userID) VALUES (1),(5),(10),(20),(30),(40),(60),(100)
INSERT INTO @tbl_small (userID) VALUES (1),(5),(10),(20)

;with deleting as (
SELECT * from @tbl_big
except
select * from @tbl_small)

DELETE b 
FROM 
    @tbl_big b
where exists ( 
select * from deleting c where c.userID = b.userID)

SELECT * from @tbl_big

answered Dec 11 '13 at 12:41

Mathese F

559
4
9

Thanks for that; I had not thought of using CTE's. As with my suggestion, it looks pretty cumbersome doesn't it! – EvilDr Dec 11 '13 at 12:52
It's also slow unless you replace the `WHERE EXISTS (SELECT...)` with `WHERE userID in...` – Twinkles Dec 11 '13 at 13:01
not really, exists is faster than in as explained here : http://stackoverflow.com/questions/2065329/sql-server-in-vs-exists-performance and i'm already limiting my results in the cte. So it means for each it will check perhaps 10-100-1000 lines (the differences) instead of a full group – Mathese F Dec 11 '13 at 13:05

score 1 · Answer 3 · answered Dec 11 '13 at 12:59

DECLARE @tbl_big TABLE (userID int)
DECLARE @tbl_small TABLE (userID int)
DECLARE @tbl_new TABLE (userID int)

INSERT INTO @tbl_big (userID) VALUES (1),(5),(10),(20),(30),(40),(60),(100)
INSERT INTO @tbl_small (userID) VALUES (1),(5),(10),(20)

INSERT INTO @tbl_new 
SELECT userID
FROM @tbl_big 
INTERSECT 
SELECT userID
FROM @tbl_small;

SELECT * from @tbl_new;

score 0 · Answer 4 · answered Dec 11 '13 at 12:50

I finally figured EXCEPT out, although its not exactly easier to read!

DELETE
    b
FROM 
    @tbl_big b INNER JOIN
    (
    SELECT
        *
    FROM
        @tbl_big EXCEPT 
    SELECT
        userID
    FROM
        @tbl_small) s ON s.userID = b.userID;

I realise now also that INTERSECT would actually delete the wrong half of the data (e.g. data NOT IN @tbl_small)

Modern alternative to DELETE ... WHERE ... NOT IN (...)?

4 Answers4