Is a single DELETE statement (with WHERE( ...SELECT...)) referentially safe?

Question

I know that any single SQL statement is implicitly run inside a transaction.
But is this enough to guarantee relational integrity for a delete from ... where (... select... ) statement?

How does the isolation level play into this?

Here's my concrete example.

I have two tables: Call and User, and a foreign key Call.UserId -> User.Id.

Several calls can point to the same user.
It's possible that no calls point to a particular user.
Some calls don't have an associated user.
Classic zero-or-one to many relationship: Call [*] -- [0..1] User.

An user is said to be orphaned if there are no calls pointing to it. New calls are added all the time, so an orphaned user may become not orphaned at some point in the future.

I want to clean up orphaned users. This can be done in a single SQL statement:

delete [dbo].[User] 
    FROM [dbo].[User] AS [U]
    WHERE ( NOT EXISTS (SELECT 
        1 AS [C1]
        FROM [dbo].[Call] AS [C]
        WHERE [U].[Id] = [C].[UserId]
    ))

The question is: Is this safe? (taking into account possible Call insertions running in parallel)

I mean, if this (excuse my pseudo-SQL):

BEGIN TRANSACTION

@orphanIds = SELECT U.Id
    FROM [dbo].[User] AS [U]
    WHERE ( NOT EXISTS (SELECT 
        1 AS [C1]
        FROM [dbo].[Call] AS [C]
        WHERE [U].[Id] = [C].[UserId]
    ))

DELETE FROM [dbo].[User] 
    WHERE Id in (@orphanIds)

COMMIT

...is equivalent to the single statement, the operation is NOT safe using SQL Server's default isolation level of READ COMMITTED.

Between the select and the delete, another transaction that inserts Calls may be committed, rendering (some) of the just-selected users non-orphan, and thus failing my transaction with a FK violation. This is easy to test: just add a WAITFOR between the select and the delete, and try inserting new calls while this transaction is waiting. The insert operation will execute immediately and will cause the FK violation in this transaction.

IMHO, While two different statements may have a third statement executed between them, that's not the possible to execute that statement while a single statement is executing, so I think applying the the where clause directly in the delete statement is safe, unlike the second example you gave where it's not safe. — Zohar Peled, Jan 19 '16 at 11:12
With ReadCommited a single isolated statement should not be disturbed by other calls. Your own thoughts seem to be quite OK... — Shnugo, Jan 19 '16 at 11:38

score 2 · Accepted Answer · edited May 23 '17 at 11:52

Your concern is warranted. The DELETE statement will not keep a lock on the Call table during its complete execution. As stated on MSDN:

By default, a DELETE statement always acquires an exclusive (X) lock on the table it modifies, and holds that lock until the transaction completes. With an exclusive (X) lock, no other transactions can modify data

However, the Call table is not the one that is modified. The inner SELECT statement will issue a shared lock, but that lock will not last until the end of the whole statement, as stated on MSDN:

Shared (S) locks on a resource are released as soon as the data has been read

So, although a concurrent INSERT Into Call statement would have to wait for the SELECT to finish, it will get through right after, and could execute in concurrence with the DELETE action.

This question lists a few similar cases a SO user experienced.

You could apply a HOLDLOCK locking hint:

HOLDLOCK - Hold a shared lock until completion of the transaction instead of releasing the lock as soon as the required table, row, or data page is no longer required. HOLDLOCK is equivalent to SERIALIZABLE.

You would write your statement like this:

DELETE FROM User
WHERE  Id NOT IN (
    SELECT UserId FROM Call WITH (HOLDLOCK)
)

Be aware of potential deadlocks though. This could happen if in a concurrent process something like this is being executed:

INSERT INTO Call (Id, UserId)
SELECT 123, Id
FROM   User WITH (HOLDLOCK)
WHERE  Name = 'Johnson'

A way to avoid such deadlocks is to make sure that tables are locked in the same order at all instances.

Note that using an isolation level of REPEATABLE READ will not offer the necessary protection, as stated on MSDN:

This prevents other transactions from modifying any rows that have been read by the current transaction. Other transactions can insert new rows that match the search conditions of statements issued by the current transaction.

I'm just starting to figure out how locking hints work. If `HOLDLOCK` is similar to a `serializable` isolation level, wouldn't running the delete statement inside a `repeatable read` transaction be 'lighter' (and preserve data integrity)? — Cristian Diaconescu, Jan 19 '16 at 20:58
It is lighter, but too light for this purpose. I added a paragraph to my answer. — trincot, Jan 19 '16 at 21:08

score 0 · Answer 2 · answered Jan 19 '16 at 13:55

0

You are right, it will not be safe, so i think in such cases you can use exclusive lock on the call table until the end of the transaction.

answered Jan 19 '16 at 13:55

Majdi

176
10

1

Do you have any reference for this claim? – trincot Jan 19 '16 at 14:35

Is a single DELETE statement (with WHERE( ...SELECT...)) referentially safe?

2 Answers2