5

There is a rare situation when process just freezes during execution of ExecuteNonQuery:

cmd.ExecuteNonQuery();

CommandTimeout is set to 0 (no timeout) and a simple UPDATE statement is being executed. This works usually fast and the issue can be reproduced in average once per week. But when it happens, it's a complete freeze/hung for the process. Last time it was reproduced we used ProcDump to get a dump file for that particular process and can see this stacktrace there. The only difference between other SO questions are first few lines (deadlock?)

OS Thread Id: 0x4168 (34)
Current frame: ntdll!NtWaitForSingleObject+0xc
ChildEBP RetAddr  Caller, Callee
23eed994 7587f699 KERNELBASE!WaitForSingleObjectEx+0x99, calling ntdll!NtWaitForSingleObject
23eeda08 7587f5f2 KERNELBASE!WaitForSingleObject+0x12, calling KERNELBASE!WaitForSingleObjectEx
23eeda1c 5829e5e2 System_Data!Np::ReadSync+0x205, calling KERNELBASE!WaitForSingleObject
23eeda58 58293812 System_Data!SNIReadSync+0x64
23eeda88 5828a795 System_Data!SNIReadSyncOverAsync+0x25, calling System_Data!SNIReadSync
23eedaa4 53ce748e (MethodDesc 5395aa60 +0x46 DomainBoundILStubClass.IL_STUB_PInvoke(SNI_ConnWrapper*, SNI_Packet**, Int32))
23eedac8 53ce748e (MethodDesc 5395aa60 +0x46 DomainBoundILStubClass.IL_STUB_PInvoke(SNI_ConnWrapper*, SNI_Packet**, Int32))
23eedae4 53cf3c57 (MethodDesc 539498d0 +0x53 SNINativeMethodWrapper.SNIReadSyncOverAsync(System.Runtime.InteropServices.SafeHandle, IntPtr ByRef, Int32)), calling 08a34a78
23eedb04 53e00e5e (MethodDesc 5394a534 +0x5e System.Data.SqlClient.TdsParserStateObject.GetTimeoutRemaining()), calling (MethodDesc 53950710 +0 System.Data.SqlClient.TdsParserStaticMethods.GetTimeoutMilliseconds(Int64))
23eedb18 53e027fb (MethodDesc 5394a90c +0x73 System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()), calling (MethodDesc 539498d0 +0 SNINativeMethodWrapper.SNIReadSyncOverAsync(System.Runtime.InteropServices.SafeHandle, IntPtr ByRef, Int32))
23eedb4c 53e02726 (MethodDesc 5394a8f8 +0x6e System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()), calling (MethodDesc 5394a90c +0 System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync())
23eedb5c 53e0181b (MethodDesc 5394a624 +0x63 System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()), calling (MethodDesc 5394a8f8 +0 System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket())
23eedb68 53e019b3 (MethodDesc 5394a660 +0x2b System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte ByRef)), calling (MethodDesc 5394a624 +0 System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer())
23eedb78 53dedda0 (MethodDesc 5394f064 +0x20c System.Data.SqlClient.TdsParser.TryRun(System.Data.SqlClient.RunBehavior, System.Data.SqlClient.SqlCommand, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.BulkCopySimpleResultSet, System.Data.SqlClient.TdsParserStateObject, Boolean ByRef)), calling (MethodDesc 5394a660 +0 System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte ByRef))
23eedc0c 53d1d4f0 (MethodDesc 538fe598 +0x7c System.Data.SqlClient.SqlCommand.FinishExecuteReader(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.RunBehavior, System.String, Boolean, Boolean, Boolean)), calling (MethodDesc 5394f064 +0 System.Data.SqlClient.TdsParser.TryRun(System.Data.SqlClient.RunBehavior, System.Data.SqlClient.SqlCommand, System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.BulkCopySimpleResultSet, System.Data.SqlClient.TdsParserStateObject, Boolean ByRef))
23eedc50 53d1d00b (MethodDesc 538fe56c +0xa2f System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(System.Data.CommandBehavior, System.Data.SqlClient.RunBehavior, Boolean, Boolean, Int32, System.Threading.Tasks.Task ByRef, Boolean, Boolean, System.Data.SqlClient.SqlDataReader, Boolean)), calling (MethodDesc 538fe598 +0 System.Data.SqlClient.SqlCommand.FinishExecuteReader(System.Data.SqlClient.SqlDataReader, System.Data.SqlClient.RunBehavior, System.String, Boolean, Boolean, Boolean))
23eedc68 746b59db clr!JIT_Security_Prolog_Framed+0x15c, calling clr!_EH_epilog3
23eedd18 53d1c27d (MethodDesc 538fe53c +0x605 System.Data.SqlClient.SqlCommand.RunExecuteReader(System.Data.CommandBehavior, System.Data.SqlClient.RunBehavior, Boolean, System.String, System.Threading.Tasks.TaskCompletionSource`1<System.Object>, Int32, System.Threading.Tasks.Task ByRef, Boolean ByRef, Boolean, Boolean)), calling 08a34c1c
23eedd40 72017e17 (MethodDesc 71da0878 +0x37 System.Collections.Hashtable.get_Item(System.Object))
23eedda8 53d16fa3 (MethodDesc 538fe22c +0x187 System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(System.Threading.Tasks.TaskCompletionSource`1<System.Object>, System.String, Boolean, Int32, Boolean ByRef, Boolean, Boolean)), calling 08a34b9c
23eede04 53d15f98 (MethodDesc 538fe160 +0xd0 System.Data.SqlClient.SqlCommand.ExecuteNonQuery()), calling 08a38678

I have found several SO questions regarding this topic as well, but none of the answers seems legit.

  1. Need help diagnosing SQL Server strange query timeouts from C#
  2. SqlDataReader hangs on GetValue() method and SNIReadSyncOverAsync
  3. ADO.Net SQLCommand.ExecuteReader() slows down or hangs
  4. SqlDataReader.GetValue Hangs
  5. Windbg - process frozen on SNIReadSyncOverAsync but no deadlock found
  6. https://social.microsoft.com/Forums/en-US/3f62d9ba-87e8-446d-b8ba-a3dfdf78a8e9/no-connection-could-be-made-because-the-target-machine-actively-refused-the-connection?forum=sharepointadmin
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Chuck Norris
  • 15,207
  • 15
  • 92
  • 123
  • Is this really related to the _language_? – jarlh Oct 16 '20 at 19:04
  • Is there anything relevant in the SQL Server logs when this happens? – stuartd Oct 16 '20 at 19:20
  • @jarlh No,I'm more keen to think that it's a .NET framework/ADO.NET issue. – Chuck Norris Oct 16 '20 at 19:28
  • @stuartd No, nothing interesting in logs. Unfortunately, I didn't have Profiler running when I accidentally reproduced the issue (which was reported by other users too). So I don't have much data from SQL side of things. – Chuck Norris Oct 16 '20 at 19:32
  • 1
    Give yourself a LOONG timeout. It'll cause an error, but it's better than a hang; you can handle an error. – Joel Coehoorn Oct 16 '20 at 20:01
  • An 600 timeout (10 minutes). If you have anything legitimately running longer than 10 minutes, this will help you find and fix it, and it protects you from hangs like this. Now you can throw the offending update in a try block, where you have a special catch specifically for timeouts that can run it again. Sure, things suck for 10 minutes, but now it resolves on it's own without manually intervention. – Joel Coehoorn Oct 16 '20 at 20:05
  • 2
    Is it possible your update statement was just being blocked by some other change? If a query is seemingly hanging "forever" it's not usually because it's slow it's usually because it is being forced to wait for something else to commit and release locks. You can check the status next time this happens by looking at the wait and blocking columns in sys.dm_exec_requests. – Aaron Bertrand Oct 16 '20 at 22:09
  • The thing is I'm not sure when I will be able to reproduce this again... There is nothing in sys.dm_exec_request from the day when this was reproduced. – Chuck Norris Oct 17 '20 at 07:15
  • If it never happens again, then why does it matter? The important thing isn't to understand the thing that happened two days ago, the important thing is learning how you can analyze and diagnose the problem in real time the next time it happens. – Aaron Bertrand Oct 17 '20 at 15:08
  • @Aaron, It happens for users periodically, but not very frequently. – Chuck Norris Oct 18 '20 at 14:17
  • 1
    I once had similar problem which was related to network connectivity issues between application and database. If you send a request to database and then wait for reply without timeout, and then before reply arrives connection is (non-gracefully) interrupted then you'll wait for reply forever, because (like with any TCP connection) one cannot find connection is dead without sending data. So I agree with others - never ever use 0 timeout, under any circumstances. – Evk Oct 19 '20 at 07:07
  • 1
    @ChuckNorris waits for the semaphore before the code even gets there. Obviously, causing a deadlock. – l33t Oct 19 '20 at 14:44
  • @l33t what do you mean? what waits for the semaphore? – Chuck Norris Oct 19 '20 at 15:31
  • Just a "Chuck Norris joke". Sorry :P – l33t Oct 19 '20 at 19:55
  • Ah, okay... Nice one xD – Chuck Norris Oct 19 '20 at 20:42

1 Answers1

1

I think the problem is clearly indicated in the last frame of the stack-trace: WaitForSingleObjectEx

Waits until the specified object is in the signaled state, an I/O completion routine or asynchronous procedure call (APC) is queued to the thread, or the time-out interval elapses.

...

If the timeout dwMilliseconds is INFINITE, the function will return only when the object is signaled or an I/O completion routine or APC is queued.

...

A thread that uses a wait function with no time-out interval may cause the system to become deadlocked.

So, yes, I believe it is a deadlock and I would recommend to never use an infinite time-out; use a sensible time-out value in a loop of multiple retries (5 should be enough) always waiting a random amount of time between tries. Note the random wait is important to resolve deadlocks otherwise the deadlock victim will confront the next item in the loop exactly after the same amount of time and, as the operation potentially takes the same amount of time, it will easily become the deadlock victim again.

Lucky Brain
  • 1,551
  • 12
  • 14
  • Are you sure setting CommandTimeout will help to solve this? If this is a deadlock happening in ADO.NET (System.Data) code itself, it may not return anything even with timeout value set, no? – Chuck Norris Oct 20 '20 at 15:29
  • On one side, WaitForSingleObjectEx is called on the way out to the database server and setting the CommandTimeout will resolve (even avoid) the deadlock in the server by itself. On the other side, if there was also a problem in the way the thread is handled in the .NET realm, the random wait in the retry loop will also help. – Lucky Brain Oct 21 '20 at 13:10
  • No, I mean are you sure that CommandTimeout property will be passed to WaitForSingleObject as dwMilliseconds? – Chuck Norris Oct 21 '20 at 15:55
  • 1
    @LuckyBrain: If the database has a deadlock it will never signal the event. If your code is then retrying and giving up you loose context what exactly did lead to the issue. When you take a dump you get the hung state with all parameters which may be important to repro the issue. This assumes you are able to make sense of a memory dump. If you cannot then you will always be blind to a specific problem class which will remain unsolvable. – Alois Kraus Oct 21 '20 at 22:13
  • 1
    @ChuckNorris: Honestly I'm not sure the CommandTimeout is the value passed to WaitForSingleObjectEx because it crosses the .NET boundary and I don't have visibility that far, I'm assuming that one. However my recommendation to avoid infinite time-outs still stands. The best bet is to give it a try. – Lucky Brain Oct 22 '20 at 13:01
  • 1
    @AloisKraus: Yes, it would be interesting to dig deeper and get more information; based in the facts I have available I detected a couple of flaws and I'm hitting on them because they are still flaws and fixing them could resolve the problem. On the other side, about deadlocks there's always a victim and a winner and, in most of the cases, the context of the victim is completely lost. – Lucky Brain Oct 22 '20 at 13:07
  • I have seen many deadlocks in memory dumps and at every occasion I could propose a fix to solve the issue without any retry. Deadlocks are conceptually easy. If you have two threads and two locks: Ensure that both locks are always taken in the same order. A deadlock cannot happen then anymore. It is as easy as that. – Alois Kraus Oct 23 '20 at 05:34
  • @AloisKraus "If the database has a deadlock it will never signal the event" - but sql server will just choose one transaction as victim in this case and kill it, while the other will proceed. In both cases sql server will send a reply back to client. And I remember reading somewhere that sal server checks for deadlocks every 5 seconds, so won't take long for that to be discovered. For that reason I doubt OP has deadlock related to database query. – Evk Oct 23 '20 at 18:54
  • @Evk: Whatever it was it is part of SQL Server where the investigation needs to be continued. The WaitForSingleObject with no timeout? at the client waiting for an answer is innocent (normally). – Alois Kraus Oct 23 '20 at 19:34