7

According to the UPDATE documentation, an UPDATE always acquires an exclusive lock on the whole table. However, I am wondering if the exclusive lock is acquired before the rows to be updated are determined or only just before the actual update.

My concrete problem is that I have a nested SELECT in my UPDATE like this:

UPDATE Tasks
SET Status = 'Active'
WHERE Id = (SELECT TOP 1 Id 
            FROM Tasks
            WHERE Type = 1
                AND (SELECT COUNT(*) 
                     FROM Tasks 
                     WHERE Status = 'Active') = 0
            ORDER BY Id)

Now I am wondering whether it is really guaranteed that there is exactly one task with Status = 'Active' afterwards if in parallel the same statement may be executed with another Type:

UPDATE Tasks
SET Status = 'Active'
WHERE Id = (SELECT TOP 1 Id 
            FROM Tasks
            WHERE Type = 2           -- <== The only difference
                AND (SELECT COUNT(*) 
                     FROM Tasks 
                     WHERE Status = 'Active') = 0
            ORDER BY Id)

If for both statements the rows to change would be determined before the lock is acquired, I could end up with two active tasks which I must prevent.

If this is the case, how can I prevent it? Can I prevent it without setting the transaction level to SERIALIZABLE or messing with lock hints?

From the answer to Is a single SQL Server statement atomic and consistent? I learned that the problem arises when the nested SELECT accesses another table. However, I'm not sure if I have to care about this issue if only the updated table is concerned.

Community
  • 1
  • 1
lex82
  • 11,173
  • 2
  • 44
  • 69
  • 1
    That documentation is wrong anyway. Updates generally won't lock the whole table. – Martin Smith Jan 08 '16 at 14:41
  • Hm, ok, thanks. But where do I find the correct documentation then? – lex82 Jan 08 '16 at 14:42
  • The documentation doesn't actually say that an `UPDATE` locks the entire table. It says that it acquires an exclusive lock, but an exclusive lock does not have to be on the entire table. – Tom H Jan 08 '16 at 15:05
  • If you lock the whole database you have limited concurrency so modern databases try to lock just the records that are being modified. At least when it is possible. Of course it depends on the database, query and version. Doesn't seems the most robust lock mechnism anyway. This could lead to bugs difficult to reproduce or debug. – borjab Jan 08 '16 at 15:05
  • Well I just wanted to lock the table, not the whole database. What I am doing happens very rarely, so impeded concurrency is not an issue. I just have to be really sure that a single record will be set to active. – lex82 Jan 08 '16 at 15:10

3 Answers3

6

If you want exactly one task with static = active, then set up the table to ensure this is true. Use a filtered unique index:

create unique index unq_tasks_status_filter_active on tasks(status)
    where status = 'Active';

A second concurrent update might fail, but you will be ensured of uniqueness. Your application code can process such failed updates, and re-try.

Relying on the actual execution plans of the updates might be dangerous. That is why it is safer to have the database do such validations. Underlying implementation details could vary, depending on the environment and version of SQL Server. For instance, what works in a single threaded, single processor environment may not work in a parallel environment. What works with one isolation level may not work with another.

EDIT:

And, I cannot resist. For efficiency purposes, consider writing the query as:

UPDATE Tasks
    SET Status = 'Active'
    WHERE NOT EXISTS (SELECT 1
                      FROM Tasks
                      WHERE Status = 'Active'
                     ) AND
          Id = (SELECT TOP 1 Id 
                FROM Tasks
                WHERE Type = 2           -- <== The only difference
                ORDER BY Id
               );

Then place indexes on Tasks(Status) and Tasks(Type, Id). In fact, with the right query, you might find that the query is so fast (despite the update on the index) that your worry about current updates is greatly mitigated. This would not solve a race condition, but it might at least make it rare.

And if you are capturing errors, then with the unique filtered index, you could just do:

UPDATE Tasks
    SET Status = 'Active'
    WHERE Id = (SELECT TOP 1 Id 
                FROM Tasks
                WHERE Type = 2           -- <== The only difference
                ORDER BY Id
               );

This will return an error if a row already is active.

Note: all these queries and concepts can be applied to "one active per group". This answer is addressing the question that you asked. If you have a "one active per group" problem, then consider asking another question.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for pointing this out! Unfortunately this is not possible in the environment I am working in. My statement was simplified anyway so I am not sure if such an index would help if I could set it up. I'd rather like to understand how the UPDATE is processed. – lex82 Jan 08 '16 at 14:48
  • 1
    @lex82 - if the simplification is e.g. that there can be only one active per (combination of one or more other columns) then you'd just have those columns in the index rather than `status`. – Damien_The_Unbeliever Jan 08 '16 at 14:59
1

This not an answer on your question... But your query is pain for my eyes :)

;WITH cte AS 
(
    SELECT *, RowNum = ROW_NUMBER() OVER (PARTITION BY [type] ORDER BY id)
    FROM Tasks
)
UPDATE cte
SET [Status] = 'Active'
WHERE RowNum = 1
    AND [type] = 1
    AND NOT EXISTS(
            SELECT 1
            FROM Tasks
            WHERE [Status] = 'Active'
        )
Devart
  • 119,203
  • 23
  • 166
  • 186
  • 1
    It is possible that the OP's original query is more efficient than this version. Note: The OP's solution should be using `NOT EXISTS` instead of `(SELECT COUNT(*) . . .)`. – Gordon Linoff Jan 08 '16 at 14:44
  • I like the RowNum but I find IsActive confusing because it does not relate to the specific row. I would use a conditional statement but I am pretty sure that this will yield exactly the problems I want to avoid because the table won't be locked. – lex82 Jan 08 '16 at 14:45
  • 1
    @Devart . . . Look at the edited portion of my answer. Your version with the CTE still needs to read the entire table to process the window functions. In this case, we want to go through the effort of finding the appropriate row without having to scan the table or associated indexes. (I should add that from a strictly logical perspective, I do like your answer.) – Gordon Linoff Jan 08 '16 at 15:06
0

No, at least the nested select statement can be processed before the update is started and locks are acquired. To make sure that no other query interferes with this update it is required to set the transaction isolation level to SERIALIZABLE.

This article (and the series it is part of) explains very well the subtleties of concurrency in SQL server:

http://sqlperformance.com/2014/02/t-sql-queries/confusion-caused-by-trusting-acid

lex82
  • 11,173
  • 2
  • 44
  • 69