Slow WHERE EXISTS in SQL Server

Question

I have a delete query that uses WHERE EXISTS but it is very slow.

delete 
from dwload.dbo.TF_Full_Backup_AT
where exists (select *
              from dwload.dbo.TFTimesheetTemp t
              inner join dwload.dbo.TFEmpTemp e on t.EmployeeCode = e.EmployeeCode
              where dwload.dbo.TF_Full_Backup_AT.TFSourceID = e.TFSourceID
                and dwload.dbo.TF_Full_Backup_AT.StartTime = t.StartTime
                and dwload.dbo.TF_Full_Backup_AT.EndTime = t.EndTime
                and dwload.dbo.TF_Full_Backup_AT.ActivityCode = t.ActivityCode
                and dwload.dbo.TF_Full_Backup_AT.PaymentCode = t.PaymentCode
                and dwload.dbo.TF_Full_Backup_AT.BranchCode = t.BranchCode)

How can I rewrite this query to run faster?

@Hogan - should not matter if the select is located in an `exists` clause. — Igor, Dec 12 '18 at 21:50
At a very minimum you should [include the actual Execution Plan](https://stackoverflow.com/a/7359705/1260204), you could use [Paste the Plan](https://www.brentozar.com/pastetheplan/) and share the link in your question. Also [try to read it yourself](https://stackoverflow.com/a/759097/1260204), maybe you can figure out the performance issue(s) with your query. — Igor, Dec 12 '18 at 21:50
@Igor -- correct doing a * or a 1 has no functional difference. Much faster to read 1 column instead of all of them. — Hogan, Dec 12 '18 at 21:50
@Hogan: since it's part of an `IF EXISTS()`, there's **no data being read**- so `SELECT * ` vs. `SELECT 1 ` **makes NO difference** — marc_s, Dec 12 '18 at 21:51
@Igor I think the point that Igor is making is that it's something simple for SQL Server to optimize - it knows you don't actually ever read the column values so it won't get them - same performance as SELECT 1. — Kevin Doyon, Dec 12 '18 at 21:52
@Hogan I don't think so, unless you have an ordering clause, SQL Server will just stop after the first result. If there is a sorting though, it needs to get all the results and sort, then return the first result. But I imagine it's an optimization that SQL Server can already do? `EXISTS` only needs one row to be true, so I would think SQL Server knows that and doesn't fetch everything for no reason? — Kevin Doyon, Dec 12 '18 at 21:53
Do you have indexes on the table being deleted from or delete triggers? Either of these things could also slow down a delete significantly. — Igor, Dec 12 '18 at 21:54
How big is the table you're deleting from? How many indexes/triggers? This is probably slow because of the I/O involved with logging all of the delete activity in a single operation, rather than anything to do with the query used to find the rows to delete. See [this post](https://sqlperformance.com/2013/03/io-subsystem/chunk-deletes). — Aaron Bertrand, Dec 12 '18 at 21:55
That points towards `stuff` happening on the delete such as index updates and/or foreign key checks and/or triggers etc. As @Igor said you need the execution plan to investigate. — Dale K, Dec 12 '18 at 21:55
@opperman How many rows is it trying to delete? I've had success in the past deleting rows in batches. For example delete the top 500 in a loop, break out of the loop when @@ROWCOUNT == 0. If you want to test out if the deleting is taking time (vs the query), put the primary key of everything want to delete (basically the original query turned into an INSERT) in a temporary table, then simply join from that temporary table to `TF_Full_Backup_AT` in your DELETE statement. You can add PRINTS with the current time to see how much time is spent on each part. — Kevin Doyon, Dec 12 '18 at 21:55
@KevinDoyon -- I don't know what you mean by one row -- this is a correlated sub-query. — Hogan, Dec 12 '18 at 22:02
create a temporary table and get all the ids to delete and then check with IN — Shereen, Dec 12 '18 at 22:08

score 0 · Accepted Answer · answered Dec 12 '18 at 22:17

It's hard to tell from your query what is hapenning.

Try something like this:

DECLARE @ToDelete TABLE
(
    TF_Full_Backup_AT_ID INT PRIMARY KEY
)

DECLARE @LastTimeStamp DATETIME = GETUTCDATE()

INSERT INTO @ToDelete (TF_Full_Backup_AT_ID)
SELECT DISTINCT b.TF_Full_Backup_AT_ID
from TF_Full_Backup_AT b
    JOIN dwload.dbo.TFTimesheetTemp t ON b.StartTime = t.StartTime
                                        and b.EndTime = t.EndTime
                                        and b.ActivityCode = t.ActivityCode
                                        and b.PaymentCode = t.PaymentCode
                                        and b.BranchCode = t.BranchCode
    inner join dwload.dbo.TFEmpTemp e on t.EmployeeCode = e.EmployeeCode
where dwload.dbo.TF_Full_Backup_AT.TFSourceID = e.TFSourceID

PRINT 'Time taken to populate the temporary table: ' + CONVERT(varchar, CAST((GETUTCDATE()-@LastTimeStamp) as time(3)))
SET @LastTimeStamp = GETUTCDATE()

DELETE b
FROM @ToDelete d
    JOIN dwload.dbo.TF_Full_Backup_AT b ON b.TF_Full_Backup_AT_ID = d.TF_Full_Backup_AT_ID


PRINT 'Time taken to delete the rows: ' + CONVERT(varchar, CAST((GETUTCDATE()-@LastTimeStamp) as time(3)))

Note: I made an assumption that you had a primary key TF_Full_Backup_AT_ID on your table.

If you find that the actual delete (and not the query that finds which row to delete) is taking a lot of time, there's not much you can do with your query. You could delete in batches, for example:

WHILE 1 = 1
    BEGIN
        DELETE TOP (5000) b
            FROM @ToDelete d
                JOIN dwload.dbo.TF_Full_Backup_AT b ON b.TF_Full_Backup_AT_ID = d.TF_Full_Backup_AT_ID

        IF @@ROWCOUNT = 0
            BREAK

    END

Thanks Kevin. `dwload.dbo.TFTimesheetTemp unfortunately does not have a key, it is populated by an API that I from our clock in system. I can explore giving it a key by concatenating the integer value of start date and time with the employee code; this should identify each line uniquely — opperman.eric, Dec 12 '18 at 22:26
You can add an int identity column which is automatically populated for the purposes of having a unique key. However based on your comments above the issue isn't the query, its the delete. — Dale K, Dec 12 '18 at 22:33

score -3 · Answer 2 · answered Dec 12 '18 at 21:55

Try the following code (I take it that you're using MS SQL)

delete 
from dwload.dbo.TF_Full_Backup_AT
from dwload.dbo.TFTimesheetTemp t
              inner join dwload.dbo.TFEmpTemp e on t.EmployeeCode = e.EmployeeCode
INNER JOIN 
dwload.dbo.TF_Full_Backup_AT
ON              
 dwload.dbo.TF_Full_Backup_AT.TFSourceID = e.TFSourceID
                and dwload.dbo.TF_Full_Backup_AT.StartTime = t.StartTime
                and dwload.dbo.TF_Full_Backup_AT.EndTime = t.EndTime
                and dwload.dbo.TF_Full_Backup_AT.ActivityCode = t.ActivityCode
                and dwload.dbo.TF_Full_Backup_AT.PaymentCode = t.PaymentCode
                and dwload.dbo.TF_Full_Backup_AT.BranchCode = t.BranchCode

Explanation:
You first write what you're deleting from Then you write the JOIN expression That should be quicker

This *should* not make any impact on performance. The real reason is not known so this answer is pure speculation. Only after the OP posts the query plan might we/you have a better understanding of what is causing the performance bottle neck. — Igor, Dec 12 '18 at 21:58
Once you write inner joins you can see better the indexes, searches and so even if you don't look closely at the query plan — Hila DG, Dec 12 '18 at 22:30

Slow WHERE EXISTS in SQL Server

2 Answers2