Horrible performance using SqlCommand Async methods with large data

Question

I'm having major SQL performance problems when using async calls. I have created a small case to demonstrate the problem.

I have create a database on a SQL Server 2016 which resides in our LAN (so not a localDB).

In that database, I have a table WorkingCopy with 2 columns:

Id (nvarchar(255, PK))
Value (nvarchar(max))

DDL

CREATE TABLE [dbo].[Workingcopy]
(
    [Id] [nvarchar](255) NOT NULL, 
    [Value] [nvarchar](max) NULL, 

    CONSTRAINT [PK_Workingcopy] 
        PRIMARY KEY CLUSTERED ([Id] ASC)
                    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                          IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, 
                          ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

In that table, I have inserted a single record (id='PerfUnitTest', Value is a 1.5mb string (a zip of a larger JSON dataset)).

Now, if I execute the query in SSMS :

SELECT [Value] 
FROM [Workingcopy] 
WHERE id = 'perfunittest'

I immediately get the result, and I see in SQL Servre Profiler that the execution time was around 20 milliseconds. All normal.

When executing the query from .NET (4.6) code using a plain SqlConnection :

// at this point, the connection is already open
var command = new SqlCommand($"SELECT Value FROM WorkingCopy WHERE Id = @Id", _connection);
command.Parameters.Add("@Id", SqlDbType.NVarChar, 255).Value = key;

string value = command.ExecuteScalar() as string;

The execution time for this is also around 20-30 milliseconds.

But when changing it to async code :

string value = await command.ExecuteScalarAsync() as string;

The execution time is suddenly 1800 ms ! Also in SQL Server Profiler, I see that the query execution duration is more than a second. Although the executed query reported by the profiler is exactly the same as the non-Async version.

But it gets worse. If I play around with the Packet Size in the connection string, I get the following results :

Packet size 32768 : [TIMING]: ExecuteScalarAsync in SqlValueStore -> elapsed time : 450 ms

Packet Size 4096 : [TIMING]: ExecuteScalarAsync in SqlValueStore -> elapsed time : 3667 ms

Packet size 512 : [TIMING]: ExecuteScalarAsync in SqlValueStore -> elapsed time : 30776 ms

30,000 ms!! That's over a 1000x slower than the non-async version. And SQL Server Profiler reports that the query execution took over 10 seconds. That doesn't even explain where the other 20 seconds are gone to!

Then I've switched back to the sync version and also played around with the Packet Size, and although it did impact a little the execution time, it was nowhere as dramatic as with the async version.

As a sidenote, if it put just a small string (< 100bytes) into the value, the async query execution is just as fast as the sync version (result in 1 or 2 ms).

I'm really baffled by this, especially since I'm using the built-in SqlConnection, not even an ORM. Also when searching around, I found nothing which could explain this behavior. Any ideas?

Post the query. If the *query* is slow, you should check the *query* for problems. Async methods aren't slow - they don't affect the query execution time **at all**. — Panagiotis Kanavos, Feb 23 '17 at 12:40
Post your query, the execution plan and the *schema* of the underlying table(s). Are there any indexes? How many rows are involved? What does the execution plan show? Are you trying to query a 'varchar' with an 'nvarchar` parameter? — Panagiotis Kanavos, Feb 23 '17 at 12:41
BTW `ExecuteScalar` doesn't mean that a `TOP 1` query will be executed. If your query returns multiple values, it wil *still* send all of them to the client. The client though will simply keep the first one and discard the rest. That's probably why reducing the packet size increases the time - your query has to return a lot more packets — Panagiotis Kanavos, Feb 23 '17 at 12:45
But he states, he has only one row in the table @PanagiotisKanavos — Fildor, Feb 23 '17 at 12:45
@Fildor does he? `ExecuteScalar` doesn't mean one row, it means read only one of the returned rows. The use of a key-value table means that there could be duplicates - which is why key-value tables are considered a very bad design. — Panagiotis Kanavos, Feb 23 '17 at 12:46
@PanagiotisKanavos "In that table, I have inserted 1 record" ... I read that as "I have only one row in that table" Can you confirm that hcd? — Fildor, Feb 23 '17 at 12:47
@Fildor yet the actual *evidence* of the packet sizes and timings shows that there are a lot of packets to move — Panagiotis Kanavos, Feb 23 '17 at 12:48
The schema is SUPER simple : 2 columns Id and Value : `CREATE TABLE [dbo].[Workingcopy]( [Id] [nvarchar](255) NOT NULL, [Value] [nvarchar](max) NULL, CONSTRAINT [PK_Workingcopy] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] And in the table there is only 1 record. Albeit one with a large "Value" (1.5mb) — hcd, Feb 23 '17 at 12:48
@hcd no repro then. *Where are the indexes?* How many key/values have you stored in there? What is the execution plan? Post code that can actually reproduce the problem - that means post the creation script *and* the script that inserts the data — Panagiotis Kanavos, Feb 23 '17 at 12:48
@PanagiotisKanavos Yes, it's a 1.5 MB value ... But still, that was only my interpretation of that cited sentence. I *could* be mistaken. — Fildor, Feb 23 '17 at 12:49
@hcd 1.5 MB ????? And you ask why retrieving that gets *slower* with decreasing packet sizes? Especially when you use the *wrong* query for BLOBs? — Panagiotis Kanavos, Feb 23 '17 at 12:51
Any difference if you do `command.ExecuteScalarAsync().Result` instead of `await` ? — Magnus, Feb 23 '17 at 12:52
@PanagiotisKanavos That was just playing around on behalf of OP. The actual question is why async is so much slower in comparison to sync with *same* package size. — Fildor, Feb 23 '17 at 12:52
Yes, only 1 record in the whole table !. And no, I am not suprised it takes slower with smaller packages sizes. What I am surprised about is that with the sync version it takes 30ms, and with the async version it takes 30 seconds for the same package size ! (1000x slower) — hcd, Feb 23 '17 at 12:53
@hcd - when adding to the question (your DDL in this case) please update the question instead of adding it to the comments. I have done it for you this time. — Igor, Feb 23 '17 at 12:54
@Magnus : same timings when using .Result instead of await ... — hcd, Feb 23 '17 at 12:56
@hcd - also please update the question for column `id`. In your original question you state its `varchar` but your ddl and query has it listed as `nvarchar`. Could you please correct it? — Igor, Feb 23 '17 at 12:56
Check [Modifying Large-Value (max) Data in ADO.NET](https://msdn.microsoft.com/en-us/library/bb399384(v=vs.110).aspx) for the correct way of retrieving CLOBs and BLOBs. *Instead* of trying to read them as one big value, use `GetSqlChars` or `GetSqlBinary` to retrieve them in a streaming fashion. Also consider storing them as FILESTREAM data - there's no reason to save 1.5MB of data in a table's data page — Panagiotis Kanavos, Feb 23 '17 at 12:57
@PanagiotisKanavos While that is true, it would still be interesting to know the reason for the huge difference in timing when you would expect it to perform rather similarly. At least I would. — Fildor, Feb 23 '17 at 12:59
@Fildor because async isn't slower. The OP is loading a 1.5MB BLOB in an inefficient way. `ExecuteReaderAsync` doesn't mean that the data will be streamed, it means that once the data gets retrieved *in the same way as before*, it will become available to the original context. The 30 sec delay appears only when the OP reduces the packet size, resulting in a lot more network traffic — Panagiotis Kanavos, Feb 23 '17 at 13:00
@Fildor the OP changed the packet size. That means, a lot more packets have to go back and forth - 64 times more. 64 * 0.450 = 28.8 seconds — Panagiotis Kanavos, Feb 23 '17 at 13:01
@PanagiotisKanavos That's not correct. OP writes sync : 20-30 ms and async with everything else same 1800 ms. The effect of changing the packet size is totally clear and expected. — Fildor, Feb 23 '17 at 13:01
@PanagiotisKanavos Well actually, the SQL profiler shows that the execution time for the async version is way slower than the sync version. And this is what puzzles me. I'm trying to figure out now how i can post screenshots here :) — hcd, Feb 23 '17 at 13:02
@hcd - there is a toolbar button to upload an image in your question (you have to edit the question to do this). — Igor, Feb 23 '17 at 13:11
Can you include the connection string (without the password if any, of course)? Is there something non-standard about the endpoint configuration? — Luaan, Feb 23 '17 at 13:42
@hcd it seems that you could remove the part about your attempts to alter package sizes since it seem irrelevant to the problem and causes confusion among some commenters. — Kuba Wyrostek, Feb 23 '17 at 13:57
Async is always going to add more time to a call, and isn't really advisable for operations that return quickly. The time it takes to spin up the task, get it scheduled, and start execution all increase the time to return the result. Having said this, the timings observed are definitely crazy. — , Feb 23 '17 at 15:07
Is there a synchronization context present when the code is executing? If so, is `ConfigureAwait` being used everywhere it should? — Paulo Morgado, Feb 23 '17 at 17:09
Wondering if the increase in elapsed time is an artifact of threading and thread scheduling. The overall task has been split between two threads - the application's thread and the worker thread asynchronously executing the query. They have to synchronize/rendezvous to exchange the result data, which is going to add some overhead, but there can also be latency - each thread doing a whole lot of extra waiting. — Zenilogix, Feb 24 '17 at 00:41
Thnx for all the comments. I've just verified and accepted the answer below from @Luaan. — hcd, Feb 24 '17 at 09:35

score 158 · Accepted Answer · edited Jul 18 '19 at 11:28

On a system without significant load, an async call has a slightly bigger overhead. While the I/O operation itself is asynchronous regardless, blocking can be faster than thread-pool task switching.

How much overhead? Let's look at your timing numbers. 30ms for a blocking call, 450ms for an asynchronous call. 32 kiB packet size means you need you need about fifty individual I/O operations. That means we have roughly 8ms of overhead on each packet, which corresponds pretty well with your measurements over different packet sizes. That doesn't sound like overhead just from being asynchronous, even though the asynchronous versions need to do a lot more work than the synchronous. It sounds like the synchronous version is (simplified) 1 request -> 50 responses, while the asynchronous version ends up being 1 request -> 1 response -> 1 request -> 1 response -> ..., paying the cost over and over again.

Going deeper. ExecuteReader works just as well as ExecuteReaderAsync. The next operation is Read followed by a GetFieldValue - and an interesting thing happens there. If either of the two is async, the whole operation is slow. So there's certainly something very different happening once you start making things truly asynchronous - a Read will be fast, and then the async GetFieldValueAsync will be slow, or you can start with the slow ReadAsync, and then both GetFieldValue and GetFieldValueAsync are fast. The first asynchronous read from the stream is slow, and the slowness depends entirely on the size of the whole row. If I add more rows of the same size, reading each row takes the same amount of time as if I only have one row, so it's obvious that the data is still being streamed row by row - it just seems to prefer to read the whole row at once once you start any asynchronous read. If I read the first row asynchronously, and the second synchronously - the second row being read will be fast again.

So we can see that the problem is a big size of an individual row and/or column. It doesn't matter how much data you have in total - reading a million small rows asynchronously is just as fast as synchronously. But add just a single field that's too big to fit in a single packet, and you mysteriously incur a cost at asynchronously reading that data - as if each packet needed a separate request packet, and the server couldn't just send all the data at once. Using CommandBehavior.SequentialAccess does improve the performance as expected, but the massive gap between sync and async still exists.

The best performance I got was when doing the whole thing properly. That means using CommandBehavior.SequentialAccess, as well as streaming the data explicitly:

using (var reader = await cmd.ExecuteReaderAsync(CommandBehavior.SequentialAccess))
{
  while (await reader.ReadAsync())
  {
    var data = await reader.GetTextReader(0).ReadToEndAsync();
  }
}

With this, the difference between sync and async becomes hard to measure, and changing the packet size no longer incurs the ridiculous overhead as before.

If you want good performance in edge cases, make sure to use the best tools available - in this case, stream large column data rather than relying on helpers like ExecuteScalar or GetFieldValue.

Great answer. Reproduced the OP's scenario. For this 1.5m string OP is mentioning, I get 130ms for the sync version vs 2200ms for async. With your approach, the measured time for the 1.5m string is 60ms, not bad. — Wiktor Zychla, Feb 23 '17 at 15:07
Good investigations there, plus I learnt a handful of other tuning techniques for our DAL code. — Adam Houldsworth, Feb 23 '17 at 15:09
Just got back to the office and tried the code on my example instead of the ExecuteScalarAsync, but I still got 30seconds execution time with a 512byte packet size :( — hcd, Feb 24 '17 at 09:21
Aha, it did work after all :) But I have to add the CommandBehavior.SequentialAccess to this line : `using (var reader = await command.ExecuteReaderAsync(CommandBehavior.SequentialAccess))` — hcd, Feb 24 '17 at 09:30
@hcd My bad, I had it in the text but not in the sample code :) — Luaan, Feb 24 '17 at 10:40
The question is meaningful, but this answer is a gem! And a beautiful line: "If you want good performance in edge cases, make sure to use the best tools available". Thanks! — Krishna Gupta, Jun 16 '20 at 10:47
its working fine for single value, but how can we read/get complete row, as there are multiple scenario where we need to get multiple columns from multiple tables using join? Thanks in advance! — Bharat Bhushan, May 19 '21 at 17:00
@BharatBhushan That's what the argument to (in this case) `GetTextReader` is - it's the ordinal of the column in the result set you want :) — Luaan, May 19 '21 at 19:13

Horrible performance using SqlCommand Async methods with large data

1 Answers1

Linked

Related