4

I'm trying to get the fastest performance for this DELETE (and SELECT) query. Is there a better way to DELETE the records, because this takes over 10 minutes to run? I imagine it has to do it's own sort and merge until it can find the records.

SELECT COUNT([VISIT_ID])
FROM [dbo].[I2B2_SRC_VISITS]
WHERE [PATIENT_ID] NOT IN (
    SELECT [PATIENT_ID] FROM [dbo].[I2B2_SRC_PATIENT]
)

DELETE FROM [dbo].[I2B2_SRC_VISITS]
WHERE [PATIENT_ID] NOT IN (
    SELECT [PATIENT_ID] FROM [dbo].[I2B2_SRC_PATIENT]
)

EDIT: I couldn't put the DELETE in front of that query like I did with the SELECT. But this was the end result for the DELETE statement.

DELETE FROM [dbo].[I2B2_SRC_VISITS]
WHERE [VISIT_ID] IN
(
    SELECT a.[VISIT_ID]
    FROM    [dbo].[I2B2_SRC_VISITS] a
            LEFT JOIN [dbo].[I2B2_SRC_PATIENT] b
                ON a.[PATIENT_ID] = b.[PATIENT_ID]
    WHERE b.[PATIENT_ID]  IS NULL
)
JustBeingHelpful
  • 18,332
  • 38
  • 160
  • 245
  • [ExplainExtended](http://explainextended.com/2009/09/15/not-in-vs-not-exists-vs-left-join-is-null-sql-server/) [BlogSQLAuthority](http://blog.sqlauthority.com/2008/04/22/sql-server-better-performance-left-join-or-not-in/) and many other online literature will steer you towards NOT IN/NOT EXISTS as you have already written. However, there are always exceptions, such as when you DON'T have the necessary indexes (which could sometimes be valid). Always try as many forms as you know and use what works well. For small datasets though, it probably doesn't matter. – RichardTheKiwi Apr 19 '13 at 03:27

1 Answers1

3

how about doing it via JOIN?

DELETE  a
FROM    [dbo].[I2B2_SRC_VISITS] a
        LEFT JOIN [dbo].[I2B2_SRC_PATIENT] b
            ON a.[PATIENT_ID] = b.[PATIENT_ID]
WHERE   b.[PATIENT_ID] IS NULL

Make sure that column [PATIENT_ID] from both tables has key define on them which makes it more faster.


Right. NOT EXIST is better one.

DELETE  a 
FROM    [dbo].[I2B2_SRC_VISITS] a 
WHERE   NOT EXISTS
        ( 
            SELECT  1 
            FROM    [dbo].[I2B2_SRC_PATIENT] b
            WHERE   a.[PATIENT_ID] = b.[PATIENT_ID] 
        )
John Woo
  • 258,903
  • 69
  • 498
  • 492
  • Can you explain why this would be faster than NOT IN? I would expect a NOT EXISTS alternative, which is constantly proven as the most efficient option, and/or suggestions around proper indexes (without which no solution will be fast). – Aaron Bertrand Apr 19 '13 at 03:20
  • WOW, ALL TIME RECORD! 21 SECONDS! THANK YOU! – JustBeingHelpful Apr 19 '13 at 03:23
  • FYI, I don't have an index on this table, but perhaps I could test with an index later – JustBeingHelpful Apr 19 '13 at 03:24
  • 1
    also try this syntax using `NOT EXIST`: `DELETE a FROM [dbo].[I2B2_SRC_VISITS] a WHERE NOT EXIST ( SELECT 1 FROM [dbo].[I2B2_SRC_PATIENT] b WHERE a.[PATIENT_ID] = b.[PATIENT_ID] )` – John Woo Apr 19 '13 at 03:27
  • later.. I have a due date of tomorrow morning. Will definitely come back to this one – JustBeingHelpful Apr 19 '13 at 03:32
  • hi @AaronBertrand, i tried to find the difference between `LEFT JOIN/IS NULL` and `NOT EXIST` on sqlfiddle and found out that they have the same execution plan. http://www.sqlfiddle.com/#!3/b0c4e/2 what is the reason for that? is it because i have little amount of records being tested? – John Woo Apr 19 '13 at 03:35
  • 1
    Probably - not enough rows to hit a tipping point. See http://www.sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join – Aaron Bertrand Apr 19 '13 at 03:36
  • @MacGyver you're happy with 21 seconds? How many rows did that affect, out of how many rows total? – Aaron Bertrand Apr 19 '13 at 03:37
  • 1
    @JW are you sure they're the same plan? I see an extra TOP operator with NOT EXISTS and an anti-semi join. The other one has LEFT OUTER JOIN which looks like it materializes the rows and **subsequently FILTERs**. Check the first link in my comment against the question. – RichardTheKiwi Apr 19 '13 at 03:38
  • 1
    @AaronBertrand right. justed it here. `NOT EXISTS` is better http://www.sqlfiddle.com/#!3/244f5/1 – John Woo Apr 19 '13 at 03:46
  • 40022 out of 90144065 – JustBeingHelpful Apr 19 '13 at 03:47
  • @RichardTheKiwi Now I realized, `NOT EXISTS` is better. http://www.sqlfiddle.com/#!3/244f5/1 – John Woo Apr 19 '13 at 03:47
  • 3
    @JW A hint. With 2 queries like that, despite all the inaccuracies and lies spouted by the query optimizer, you can see their relative estimated costs by making it a single batch, e.g. http://www.sqlfiddle.com/#!3/244f5/2/0. The (single) plan shows both queries, and the costs are now easier to compare. – RichardTheKiwi Apr 19 '13 at 03:55