1

I am working with a project which involves processing a lot of text files and results in either inserting records into an mssql db or updating existing information.

The sql statement is written and stored in a list until the files have finished being processed. This list is then processed. Each statement was being processed one at a time but as this could be thousands of statements and could create a very long running process.

To attempt to speed up this process i introduced some parallel processing but this occasionally results in the following error:

Transaction (Process ID 94) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Code as follows:

public static void ParallelNonScalarExecution(List<string> Statements, string conn)
    {
        ParallelOptions po = new ParallelOptions();
        po.MaxDegreeOfParallelism = 8;
        CancellationTokenSource cancelToken = new CancellationTokenSource();
        po.CancellationToken = cancelToken.Token;
        Parallel.ForEach(Statements, po, Statement =>
        {
            using (SqlConnection mySqlConnection = new SqlConnection(conn))
            {
                mySqlConnection.Open();
                using (SqlCommand mySqlCommand = new SqlCommand(Statement, mySqlConnection))
                {
                    mySqlCommand.CommandTimeout = Timeout;
                    mySqlCommand.ExecuteScalar();
                }            
            }
        });
    }

The update statements i believe are simple in what they are trying to achieve:

UPDATE TableA SET Notes = 'blahblahblah' WHERE Code = 1 
UPDATE TableA SET Notes = 'blahblahblah', Date = '2016-01-01' WHERE Code = 2
UPDATE TableA SET Notes = 'blahblahblah' WHERE Code = 3 
UPDATE TableA SET Notes = 'blahblahblah' WHERE Code = 4
UPDATE TableB SET Type = 1 WHERE Code = 100
UPDATE TableA SET Notes = 'blahblahblah', Date = '2016-01-01' WHERE Code = 5
UPDATE TableB SET Type = 1 WHERE Code = 101

What is the best way to overcome this issue?

Barnsley
  • 117
  • 2
  • 17
  • Catch deadlock graph with `profiler`. And is this sql code demonstrated a single update statement or this is a list of executed in parallel single-line statements? – Ivan Starostin Jun 30 '16 at 16:29
  • Maybe you should redesign what you are doing. Looks like you are creating a list of statements and then trying to execute them, opening a new connection per statement? Why would you want to do that. Maybe it is much easier and faster to load your text files to temp tables on SQL server and then do simple server side insert/update (which likely would be a single insert and a single update). – Cetin Basoz Jun 30 '16 at 16:34
  • @CetinBasoz i believe i need one connection per thread. I previously did open just one connection but experienced issues with the command as noted here: [link](http://stackoverflow.com/questions/18475195/error-there-is-already-an-open-datareader-associated-with-this-command-which-mu) – Barnsley Jul 01 '16 at 08:12
  • That link is unrelated and code used is buggy, I wouldn't use as a reference. That is not what I am saying. I am saying that, probably what you are doing is something that could be done with single or two commands executed in a single batch. – Cetin Basoz Jul 01 '16 at 15:15

2 Answers2

0

Thread A updates resource X and does not commit and can continue doing more updates. Thread B updates resource y and does not commit and can continue doing more updates. At this point, both have uncommitted updates.

Now thread A updates resource y and waits on the lock from Thread B. Thread B is not held up by anything, so it goes on, eventually tries to update resource x and is blocked by the lock A has on x. Now they are in a deadlock. It's a stalemate, not one can proceed to commit, so the system kills one. You have to commit more often to reduce the chances of a deadlock (but that does not eliminate the possibility entirely), or you have to carefully order your updates so all updates to x get done and completed before going on to do any updates on y.

Jabrwoky
  • 61
  • 6
0

From what I see you don't want to do what you are doing. I would NOT recommend having multiple update statements that effect the same data/table on different threads. This is the breeding of a race condition/dead lock. In your case it should be safe, but if at any point you changed the where condition and there was overlap you would have a race condition issue.

If you really wanted to speed this up with multi-threading than having all of the update statements for tableA on one thread and all of the update statements on tableB on one thread. Another idea is to block your update statements.

UPDATE TableA SET Notes = 'blahblahblah' WHERE Code IN (1,2,3,4,5)
UPDATE TableA SET Date = '2016-01-01' WHERE Code IN (2,5)
UPDATE TableB SET Type = 1 WHERE Code IN (100,101)

These above statements should be able to be independently execute in a concurent enviroment as no two statements effect the same column?

NewDeveloper
  • 301
  • 1
  • 9
  • " I would NOT recommend having multiple update statements that effect the same data/table on different threads" then you are saying that parallelism is impossible. Every honest DBMS is capable to that kind of thing – deFreitas Mar 05 '21 at 00:17