-1

We have a backend job written in C# with threading and generate a new thread in every second (Somehow, we can't increase this time. ). It's read data from DB for processing help of a stored procedure and send request to interfacing system. Currently we are facing issue where same data is pulled in multiple process and found deadlock in our log table. Please suggest how can we impalement locking so the same type of data can be processed by only a single process and other process will have the different data.

DB: SQL Server SP Code: Given below

ALTER PROCEDURE [Migration]                
AS                  
BEGIN           
declare @ConversationID varchar(200)='',Group varchar(100) =''
-- Select records with Flag 0                  
select     
top 1 @ConversationID = ConversationID,
@Group = Group     
from [Migration]    
where NewCode = (select top 1 NewCode     
   from [Migration]     
   where Flag = 0 group by NewCode, Group, InsertDate    
   )    
and Flag = 0;   
select  *  from [Migration] where ConversationID = @ConversationID  and Group = @Group;  
BEGIN TRANSACTION                    
BEGIN TRY                    
    update [Migration]  set Flag = 1 where ConversationID = @ConversationID and Group = @Group and 
Flag = 0;                   
    COMMIT TRANSACTION                    
            
END TRY                    
BEGIN CATCH                    
    ROLLBACK TRANSACTION                    
    insert into Logs(ErrorType,Description) values('MigrationError',ERROR_MESSAGE());                 
END CATCH                                   
END    
  • 2
    You also have a `TOP 1` in there without an `ORDER BY`, meaning that the instance is free to return what ever arbitrary row it feels like. That is not going to give you consistent and thus reliable results/behaviour. – Thom A Nov 02 '20 at 11:41
  • That same subquery, as well, has a `GROUP BY` yet no aggregation; that is a *very* confused subquery. What is it's aim? – Thom A Nov 02 '20 at 11:42
  • 1
    Plus (finally?) you have a column called `group` which you don't delimit identify, however, as `GROUP` is a [Reserved Keyword](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/reserved-keywords-transact-sql?view=sql-server-ver15), which means the above SQL would error. It is highly advised you do not use Reserved (key)words for object names so that such errors don't occur. – Thom A Nov 02 '20 at 11:45
  • Thanks @Larnu, We used group by as CoversationID has multiple sub records in same table, hence we want to send only 1 ConversatinoID to process further for handling similar record in 1 go. – Narender Singh Nov 02 '20 at 11:47
  • The 'Group' is 'CGroup' in actual query – Narender Singh Nov 02 '20 at 11:48
  • But you're using `TOP 1` (without an `ORDER BY`), @Narender, so you're already limited to 1 arbitrary row. Having a `GROUP BY` makes no sense when you aren't aggregating. The `GROUP BY` should be removed and replaced with an appropriate `ORDER BY`. – Thom A Nov 02 '20 at 11:48
  • 3
    *"The 'Group' is 'CGroup' in actual query"* then you should ensure that you don't invalid your code when you alter it for Stack Overflow, as otherwise it can confuse or derail your question. – Thom A Nov 02 '20 at 11:49
  • The shared query is working fine, only issue we are encountering is, the other process pick the same data before selecting the flag by current process – Narender Singh Nov 02 '20 at 11:58
  • 2
    if the 2 processes pick the same **arbitrary row** (I can't keep emphasising that more...) they will do; this is called a "race condition". – Thom A Nov 02 '20 at 11:59

1 Answers1

0

The typical deadlock message is something like

Transaction (Process ID 100) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

However, it's hard to see where your query would get deadlocked the traditional way - I can see it being blocked, but not deadlocked, as you typically need more than one update to be occurring within the same transaction. Indeed, your transaction above is only a single command - the transaction around it is fairly meaningless as each command is run in its own implicit transaction anyway.

Instead, I'm guessing that your deadlock message is, instead, something like

Transaction (Process ID 100) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Note the difference in what the deadlock was on - in this case, the lock/communication buffer resources.

These deadlocks are related to issues with parallelism; see (for example) What does "lock | communication buffer resources" mean? for more information.

You will tend to get this if the SP does parallel processing and you're running this SP many times in a row. Setting MAXDOP 1 and/or improving your indexing often fixes such issues - but obviously you will need to do your research on the best approach in your own specific circumstances.


Regarding the question itself - how to make it so that only one thing can deal with a given row at a time?

There are numerous methods. My usual methods involve statements with OUTPUT clauses. The advantage of those is that you can do a data insert/update as well as record what you have inserted or updated within the same command.

Here is an example from your current thing - it sets the flag to be -1 for the rows it's dealing with.

CREATE TABLE #ActiveMigrations (ConversationID varchar(200), CGroup varchar(100));

-- Select records with Flag 0                  
WITH TopGroup AS (
    select top 1 ConversationID, CGroup, Flag     
    from [Migration]    
    where NewCode = (select top 1 NewCode     
                   from [Migration]     
                   where Flag = 0 group by NewCode, CGroup, InsertDate    
                   )    
          and Flag = 0
    )
UPDATE   TopGroup
    SET  Flag = -1
    OUTPUT ConversationID, CGroup
    INTO #ActiveMigrations (ConversationID, CGroup)
    FROM TopGroup;  

In the above, you have the ConversaionID and Group in the temporary table (rather than variables) for further processing, and the flags set to -1 so they're not picked up by further processing.

A similar version can be used to track (in a separate table) which are being operated on. In this case, you create a scratch table that includes active conversations/groups e.g.,

-- Before you switch to the new process, create the table as below
CREATE TABLE ActiveMigrations (ConversationID varchar(200), CGroup varchar(100));

You can code this with OUTPUT clauses as above, and (say) include this table in your initial SELECT as a LEFT OUTER JOIN to exclude any active rows.

A simpler version is to use your SELECT as above to get a @ConversationID and @CGroup, then try to insert it the table

INSERT INTO ActiveMigrations (ConversationID, CGroup)
SELECT @ConversationID, @Group
WHERE NOT EXISTS 
      (SELECT 1 FROM ActiveMigrations WHERE ConversationID = @ConversationID AND CGroup =  @Group);

IF @@ROWCOUNT = 1
  BEGIN
  ...

The key thing to think about with these, is if you have the command running twice simultaneously, how could they interact badly?

To be honest, your code itself

update [Migration]  
set Flag = 1 
where ConversationID = @ConversationID 
     and Group = @Group 
     and Flag = 0;

protects itself because a) it's one command only, and b) it has the Flag=0 protection on the end.

It does waste resources as the second concurrent process does all the work, then gets to the end and has nothing to do as the first process already updated the row. For something running regularly and likely to have concurrency issues, then you probably should code it better.

seanb
  • 6,272
  • 2
  • 4
  • 22