10

Environment:

Application (written in C# for .Net 4) has up to 10 threads, each thread runs in its own AppDomain. Each thread uses a ADO.Net DataReader that gets the results from the stored procedure on SQL-Server 2008. Also a thread can use ADO.Net to perform a write operation (Bulk Insert). Everything runs on the local machine.

Problem #1:

Occasionally (approximately each 30th run) execution of a thread slows down drastically. That happens when DataReader gets the stored procedure results - SqlCommand.ExecuteReader(). Usually read operation executes in 10 seconds. When it slows down, it executes in 10-20 minutes. SQLProfiler shows that data is being queried, though very slowly.

Callstack of the slowdown (please note that there are no exceptions):

at SNIReadSync(SNI_Conn* , SNI_Packet** , Int32 )
   at SNINativeMethodWrapper.SNIReadSync(SafeHandle pConn, IntPtr& packet, Int32 timeout)
   at System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult asyncResult, TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()
   at System.Data.SqlClient.TdsParserStateObject.ReadBuffer()
   at System.Data.SqlClient.TdsParserStateObject.ReadByteArray(Byte[] buff, Int32 offset, Int32 len)
   at System.Data.SqlClient.TdsParserStateObject.ReadString(Int32 length)
   at System.Data.SqlClient.TdsParser.ReadSqlStringValue(SqlBuffer value, Byte type, Int32 length, Encoding encoding, Boolean isPlp, TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.ReadSqlValue(SqlBuffer value, SqlMetaDataPriv md, Int32 length, TdsParserStateObject stateObj)
   at System.Data.SqlClient.SqlDataReader.ReadColumnData()
   at System.Data.SqlClient.SqlDataReader.ReadColumnHeader(Int32 i)
   at System.Data.SqlClient.SqlDataReader.ReadColumn(Int32 i, Boolean setTimeout)
   at System.Data.SqlClient.SqlDataReader.GetValueInternal(Int32 i)
   at System.Data.SqlClient.SqlDataReader.GetValue(Int32 i)
   at System.Data.SqlClient.SqlDataReader.get_Item(String name)
   at ****.Core.TableDataImporter.ImportDataFromExcel(Int32 tableId, ExcelEntityLocation location, Boolean& updateResult) in …

Problem #2:

Instead of a slowing down a thread can hang.

Callstack:

at SNIReadSync(SNI_Conn* , SNI_Packet** , Int32 )
   at SNINativeMethodWrapper.SNIReadSync(SafeHandle pConn, IntPtr& packet, Int32 timeout)
   at System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult asyncResult, TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()
   at System.Data.SqlClient.TdsParserStateObject.ReadBuffer()
   at System.Data.SqlClient.TdsParserStateObject.ReadByte()
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
   at System.Data.SqlClient.SqlDataReader.get_MetaData()
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
   at System.Data.SqlClient.SqlCommand.ExecuteReader()

Callstacks were acquired using debug tools in the background thread. No exceptions do happen, either a slowdown or hanging up.

SNIReadSync is a mechanism that works on the network level and works with transmitting packets across the network. We have reproduced this problem on local machine, removing network problems from the equation.

We are looking for any input and solutions or workarounds for this slowdowns/hang ups. For now we a planning to detect the slowdown an rerun the operation. Thanks in advance.

I'm appending simplified code for the method as requested:

  public void ImportDataFromExcel()
    {            
        try
        {                
            var _сonnectionBuilk = ... ; // singleton connection (at the app level)
            var spName = ... ; // stored procedure name

        var сonnectionToRead = new SqlConnection(connectionStirng);
        сonnectionToRead.Open();

        var sqlCommand = new SqlCommand(spName);
        sqlCommand.CommandType = CommandType.StoredProcedure; 
        sqlCommand.Parameters.Add(param1Name, SqlDbType.Int).Value = ...;
        sqlCommand.Parameters.Add(param2Name, SqlDbType.Int).Value = ...;
        sqlCommand.Parameters.Add(param2Name, SqlDbType.Int).Value = ...;

        sqlCommand.Connection = сonnectionToRead;            
        sqlCommand.CommandTimeout = timeout; // 120 sec

        using (var dataReader = sqlCommand.ExecuteReader())
        {
                dataReader.Read();
            .....
            int pos1 = dataReader.GetOrdinal(columnName1);
            int pos2 = dataReader.GetOrdinal(columnName2);
            int pos3 = dataReader.GetOrdinal(columnName3);
            int pos4 = dataReader.GetOrdinal(columnName4);
                .....                    

            // reading data from sqldatareader
            int val1 = dataReader.GetInt32(pos1);
            int val2 = dataReader.GetInt32(pos2);
            int val3 = dataReader.GetInt32(pos3);
            var val4 = dataReader.GetDateTime(pos4);
            .....

            // append read data into bulkTable
            bulkTable.AddCellValue(val1, val2, val3, val4);  // bulkTable wraps DataTable, and appends DataRow inside. 

            if(bulkTable.DataTable.Rows > MaxRowsCount)
            {
                using (var bulkCopy = new SqlBulkCopy(_сonnectionBuilk))
                {
                    bulkCopy.DestinationTableName = _fullTableName;
                    bulkCopy.WriteToServer(bulkTable.DataTable);
                }

                var sqlCommandTransfer = new SqlCommand(spName);
                sqlCommandTransfer.CommandType = CommandType.StoredProcedure; 
                sqlCommandTransfer.Parameters.Add(param1Name, SqlDbType.Int).Value = ...;
                sqlCommandTransfer.Connection = _сonnectionBuilk;
                ....
                sqlCommandTransfer.ExecuteNonQuery(); // transfering data from temp bulk table into original table
            }
        }
    }
    finally
    {
        bulkTable.Dispose();
        сonnectionToRead.Close();
    }
}
Cortlendt
  • 2,190
  • 4
  • 29
  • 50
  • 1
    Can you show code? Especially where you're creating,openeing,closing the connection and executing the reader. – Tim Schmelter Aug 24 '12 at 13:01
  • 2
    That sounds more like an issue on the db side. Is there some worker process on sql that runs periodically? A Synch or some action that is causing a row/table lock? How big are the tables involved? what kind of indexes/constraints etc. – Brian Aug 24 '12 at 13:10
  • @TimSchmelter - I've appended code that hangs. – Cortlendt Aug 24 '12 at 14:10
  • How many rows would you estimate are typically being transferred? – PinnyM Aug 24 '12 at 14:14
  • 1
    What is `SqlCommand` since the ADO.NET `SqlCommand` does not have a `Connector` (but `Connection`) property? Why are you using a `SqlConnection` **and** a _"_сonnectionBuilk = ... ; // singleton connection"_? Is the ConnectionPool enabled via `ConnectionString`? I assume that your singleton connection is static, therefore the pool cannot reuse a connection for multiple threads and always needs to create a new physical connection. – Tim Schmelter Aug 24 '12 at 14:16
  • There's something other strange in your snippet. You have declared `сonnectionToRead` in the `Try/Catch`, but you're disposing it in the `Finally`, that wouldn't even compile. We can't help you unless you tell us the truth. ;) – Tim Schmelter Aug 24 '12 at 14:21
  • @PinnyM hundred of thousand rows – Cortlendt Aug 24 '12 at 14:34
  • @TimSchmelter - the code is a very simplified version of production code, to fit it under SO constraints. 2 connections are used cause 2 operations are going on - read operation and bulk insert. Static connection is only being open when needed. – Cortlendt Aug 24 '12 at 14:40
  • @Brian - we are consulting with our DB specialist about particular case – Cortlendt Aug 24 '12 at 14:42
  • If you can see this in SQL Profiler, then what is happening with the statement durations during the slow-downs? Are they getting larger or staying the same? – RBarryYoung Aug 24 '12 at 14:49
  • @RBarryYoung - please clarify what is a statement duration? – Cortlendt Aug 24 '12 at 14:52
  • "Duration" is a column that you can add in SQLProfiler under the Performance category. It is only filled-in for the "completion" events (statement, Batch, SP, RPC, etc.), so make sure that you do not have just the "starting" events. – RBarryYoung Aug 24 '12 at 14:56
  • I should add that "Duration" tells you how long that particular SQL event took to complete. If the durations go up by an order of magnitude during the slow-downs, then your problem is on the SQL Server. If they stay the same, then your problem is either in the network or on your Client (presumably in your client program). – RBarryYoung Aug 24 '12 at 15:00
  • @RBarryYoung - Durations are short and the statements are being processed fast. (with the SP preffix). – Cortlendt Aug 24 '12 at 15:08
  • Then the problem is not on the DB. You can drop that line of investigation. – RBarryYoung Aug 24 '12 at 15:10

4 Answers4

4

We've been trying to debug a similar issue for months, and finally tracked it down today...

We had a query that was being stashed into the cache (without calliong ToList/ToArray/etc. on it). The query was effectively tied to a connection that had since been cleaned up, and we got what appeared to be 100% CPU blocking from ReadSni (full stack included below).

I suspect the caching code was written before the query was changed over to use Linq (and used to instead return a List<T>, but still cast as IEumberable) so it was introduced when someone made the data access "lazy".

I can't explain why it only happened every few days in production; either the caching isn't being used a lot, or the connection had to be in a certain state for it to fail in this way.

OS Thread Id: 0x20b8 (27)
Child SP IP       Call Site
16edd0fc 6184267e System.Data.SqlClient.TdsParserStateObject.ReadSni(System.Data.Common.DbAsyncResult, System.Data.SqlClient.TdsParserStateObject)
16edd134 61842624 System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()
16edd144 618446af System.Data.SqlClient.TdsParserStateObject.ReadBuffer()
16edd150 61c583d0 System.Data.SqlClient.TdsParserStateObject.CleanWire()
16edd15c 61d1beb9 System.Data.SqlClient.TdsParser.Deactivate(Boolean)
16edd174 6184995f System.Data.SqlClient.SqlInternalConnectionTds.InternalDeactivate()
16edd180 61849640 System.Data.SqlClient.SqlInternalConnection.Deactivate()
16edd1b0 61849587 System.Data.ProviderBase.DbConnectionInternal.DeactivateConnection()
16edd1e4 61849405 System.Data.ProviderBase.DbConnectionPool.DeactivateObject(System.Data.ProviderBase.DbConnectionInternal)
16edd224 61849384 System.Data.ProviderBase.DbConnectionPool.PutObject(System.Data.ProviderBase.DbConnectionInternal, System.Object)
16edd26c 6184920c System.Data.ProviderBase.DbConnectionInternal.CloseConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)
16edd2ac 618490f7 System.Data.SqlClient.SqlInternalConnection.CloseConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)
16edd2c4 618393bf System.Data.SqlClient.SqlConnection.Close()
16edd304 11238f0a NHibernate.Connection.ConnectionProvider.CloseConnection(System.Data.IDbConnection)
16edd340 11238eae NHibernate.Connection.DriverConnectionProvider.CloseConnection(System.Data.IDbConnection)
16edd34c 11aceb42 NHibernate.AdoNet.ConnectionManager.CloseConnection()
16edd358 11aceb02 NHibernate.AdoNet.ConnectionManager.AggressiveRelease()
16edd364 11acf783 NHibernate.AdoNet.ConnectionManager.AfterTransaction()
16edd370 11acf6d1 NHibernate.Impl.SessionImpl.AfterTransactionCompletion(Boolean, NHibernate.ITransaction)
16edd3ec 11acf5de NHibernate.AdoNet.ConnectionManager.AfterNonTransactionalQuery(Boolean)
16edd3fc 11acf539 NHibernate.Impl.AbstractSessionImpl.AfterOperation(Boolean)
16edd474 130311e4 NHibernate.Impl.SessionImpl.List(NHibernate.IQueryExpression, NHibernate.Engine.QueryParameters, System.Collections.IList)
16ede51c 13031071 NHibernate.Impl.AbstractSessionImpl.List(NHibernate.IQueryExpression, NHibernate.Engine.QueryParameters)
16ede538 13030b68 NHibernate.Impl.ExpressionQueryImpl.List()
16ede568 13030a47 NHibernate.Linq.DefaultQueryProvider.ExecuteQuery(NHibernate.Linq.NhLinqExpression, NHibernate.IQuery, NHibernate.Linq.NhLinqExpression)
16ede59c 11d4c163 NHibernate.Linq.DefaultQueryProvider.Execute(System.Linq.Expressions.Expression)
16ede5b0 11d4c108 NHibernate.Linq.DefaultQueryProvider.Execute[[System.__Canon, mscorlib]](System.Linq.Expressions.Expression)
16ede5c4 11d4c0a6 Remotion.Linq.QueryableBase`1[[System.__Canon, mscorlib]].GetEnumerator()
16ede5d4 61022108 System.Linq.Enumerable+WhereEnumerableIterator`1[[System.__Canon, mscorlib]].MoveNext()*** WARNING: Unable to verify checksum for System.Core.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for System.Core.ni.dll

16ede5e4 610166ea System.Linq.Buffer`1[[System.__Canon, mscorlib]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)
16ede620 6122e171 System.Linq.OrderedEnumerable`1+<GetEnumerator>d__0[[System.__Canon, mscorlib]].MoveNext()
16ede63c 79b39758 System.Collections.Generic.List`1[[System.__Canon, mscorlib]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)*** WARNING: Unable to verify checksum for mscorlib.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for mscorlib.ni.dll

16ede66c 61021acf System.Linq.Enumerable.ToList[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1<System.__Canon>)
Stephan Bauer
  • 9,120
  • 5
  • 36
  • 58
Danny Tuppeny
  • 40,147
  • 24
  • 151
  • 275
  • Were you able to solve the issue? If so, was it a configuration issue on the SQL server side, the ADO.NET side, or a code change? – cortijon Apr 04 '13 at 22:28
  • 1
    A code change. We called ToList() before putting the items in cache, so the actual items were cached instead of a not-yet-evaluated enumeration. In don't think we've seen it since :-) – Danny Tuppeny Apr 05 '13 at 05:28
1

Because the code works perfectly for a while we can narrow it down to:

  • Database locking / blocking from your processes and some others.
  • Database locking / blocking from your processes only.
  • Network connectivity.
  • State of the data.
  • Disk space or some other seemingly unrelated issue on the server.

I'd say it is likely a database lock / block issue. But you really need to determine this for certain. To do this:

  • Make sure there is disk space where the databases are writing to - including the database logs and any other logging.
  • Make sure no other processes are using the database.
  • Preferably use a local database as well to eliminate network issues.
  • You are using .Net 4 - so if you are using Tasks then it will be very easy to make them run synchronously with the overloads. Do this and see if the issue is still there.

Doing all of the above should eliminate the issues - and you can start narrowing it down further from there.

PeteGO
  • 5,597
  • 3
  • 39
  • 70
0

I had the same problem and I solved after several tests I talk about my specific case, but I think it could be useful.

In my sistem I have a complex OLAP engine on SQLServer 2008. Over the years, the operations grew in volume and quantity and randomly I got the error mentioned in the post ... raising operations the error went away. I double checked the code of all the operations OLAP and all transactions, connections, and reader objects were handled perfectly.

The critical points that generated errors (but not always) when I was sections of WHILE loops to query for SQLServer, the repeated bombing of querys generated errors in DB

Ho ristrutturato le molte piccole querys in una query più grandi(in termini di resultset) ed ho operato delle operazioni attraverso LINQ, il risultato è stato soprendente. Perdormance di 10 volte migliorate e 0 errori.

The advice I can give is to analyze well the critical sections of code, even being very careful how you use Connections, DataReader and Transactions.

daniele3004
  • 13,072
  • 12
  • 67
  • 75
0

I know that this question is 8 years old, but maybe my answer will help some poor developer that faces this issue in the future :)

In my case this was a deadlock on the database, but not really obvious one. When sp_whoisactive was executed it returned many rows blocked by single sleeping connection. Initially i thought that it may be due to some network issues, but after several hours of tests with windbg and pluging out network cable I still wasn't able to reproduce the hang.

In the end I noticed that there were actually 2 connections from app from blocking user:

  • the suspended one, which already hold some locks
  • other one that was blocked by connection from another user.

It turned out that:

  1. First connection was opened, it locked some objects on the database, but in the end it received an exception via RAISERROR command
  2. In the meantime connection from other app appears, takes some locks on db, but in the end is blocked by first one
  3. App that already made connection nr.1 opened new one using RequiresNew transaction option, which was then blocked by second connection

Also all 3 connections were made in 7ms time window, which would explain why there was no such issue for months.

If you face this issue I advise to use sp_whoisactive procedure - http://whoisactive.com You can execute it with @get_locks = 1 parameter, which will return all locked objects by given connection. This combined with callstack should give you enough information to fix this problem. Or to determine if it is not the case ;)

Michal
  • 41
  • 3