4

Stumbled upon a relatively often used piece of code which seemed inefficient at first. (I know optimization could be evil sometimes, but I was wondering)

introduction part - fairly simple SP execution + reading the returned data:

try
{
    await connection.OpenAsync();
    using (var command = connection.CreateCommand())
    {
        command.CommandText = sql.ToString();
        command.Parameters.AddRange(sqlParameters.ToArray());

        var reader = await command.ExecuteReaderAsync();
        if (reader.HasRows)
        {
            while (await reader.ReadAsync())
            {
                 var item = await GetProjectElement(reader);
                 list.Add(item);
            }
         }

         reader.Dispose();
     }      
}
finally
{
    connection.Close();
}

What worries me is the function

await GetProjectElement(reader)

private async Task<Project> GetProjectElement(DbDataReader reader)
{
    var item = new Project
    {
        Id = await reader.GetFieldValueAsync<int>(1),
        ParentId = await reader.IsDBNullAsync(2) ? default(int?) : await reader.GetFieldValueAsync<int>(2),
        Name = await reader.IsDBNullAsync(3) ? default(string) : await reader.GetFieldValueAsync<string>(3),
        Description = await reader.IsDBNullAsync(4) ? default(string) : await reader.GetFieldValueAsync<string>(4),
        Address = await reader.IsDBNullAsync(5) ? default(string) : await reader.GetFieldValueAsync<string>(5),
        City = await reader.IsDBNullAsync(6) ? default(string) : await reader.GetFieldValueAsync<string>(6),
        PostalCode = await reader.IsDBNullAsync(7) ? default(string) : await reader.GetFieldValueAsync<string>(7),
        Type = (ProjectTypeEnum)(await reader.GetFieldValueAsync<byte>(8)),
        StartDate = await reader.IsDBNullAsync(9) ? default(DateTime?) : await reader.GetFieldValueAsync<DateTime>(9),
        EstimatedEndDate = await reader.IsDBNullAsync(10) ? default(DateTime?) : await reader.GetFieldValueAsync<DateTime>(10),
        ActualEndDate = await reader.IsDBNullAsync(11) ? default(DateTime?) : await reader.GetFieldValueAsync<DateTime>(11),
        WebsiteUrl = await reader.IsDBNullAsync(12) ? default(string) : await reader.GetFieldValueAsync<string>(12),
        Email = await reader.IsDBNullAsync(13) ? default(string) : await reader.GetFieldValueAsync<string>(13),
        PhoneNumber = await reader.IsDBNullAsync(14) ? default(string) : await reader.GetFieldValueAsync<string>(14),
        MobilePhoneNumber = await reader.IsDBNullAsync(15) ? default(string) : await reader.GetFieldValueAsync<string>(15),
        Key = await reader.IsDBNullAsync(16) ? default(Guid?) : await reader.GetFieldValueAsync<Guid>(16),
        OrganizationElementId = await reader.GetFieldValueAsync<int>(17),
        CompanyOrganizationElementId = await reader.IsDBNullAsync(18) ? default(int?) : await reader.GetFieldValueAsync<int>(18),
        IsArchived = await reader.GetFieldValueAsync<bool>(20),
        IsDeleted = await reader.GetFieldValueAsync<bool>(21),
        CreatedOn = await reader.GetFieldValueAsync<DateTime>(22),
        CreatedBy = await reader.GetFieldValueAsync<string>(23),
        ModifiedOn = await reader.IsDBNullAsync(24) ? default(DateTime?) : await reader.GetFieldValueAsync<DateTime>(24),
        ModifiedBy = await reader.IsDBNullAsync(25) ? default(string) : await reader.GetFieldValueAsync<string>(25)
    };

    return item;
}

as you can see there are a LOT of await calls which the compiler turns into state machines, doesn't it?

You can find a simplified version of what the compiler generated code looks like here. Plenty of GOTOs which means context switching over and over again.

Since the SP was executed without specifying a CommandBehavior - the data will be in non-sequential mode. (probably the reason is a table row isn't supposed to be very large in bytes for this case of Project link)


My questions are:

1) is this abusing of the async/await without an obvious reason, because the row data is already buffered in memory, right?

2) is Task<Project> a pure overhead in this scenario?

3) would this approach actually have a worse performance compared to one without awaiting


Final thoughs: If I get things right, we would want to use CommandBehavior.SequentialAccess for large table rows where content may exceed a reasonable length thus making us want to read it asynchronously? (like storing varbinary(max) or blobs)

noobed
  • 1,329
  • 11
  • 24
  • If you are concerned about performance, then your best bet is to test it and find out. – DavidG Jan 16 '19 at 12:26
  • @DavidG I though of it too, however I struggle to define if and how to test it correctly in case of different hardware execution. I read that some modern processors have mi mimal cost to unconditional jumps. However what I've learnt so far is it's always better not to jump. – noobed Jan 16 '19 at 12:31
  • Why is hardware a concern? A comparative performance test should have 2 variations of the same code, running on the same machine. That way you can compare them together and it doesn't matter which hardware runs it. – DavidG Jan 16 '19 at 13:47
  • 3
    "Plenty of GOTOs which means context switching over and over again." 1) that's not what a context switch is, 2) "gotos" (and all the other `async` mechanics) need not be translated to actual jumps by the JIT compiler and 3) even when jumps are involved, the branch predictor will handle such situations fine when the jumps are taken the same way. All that said, yes, you probably *will* measure overhead with code that reads even the individual columns `async`. Whether it's significant in your scenario is [another issue](https://ericlippert.com/2012/12/17/performance-rant/). – Jeroen Mostert Jan 16 '19 at 14:02

1 Answers1

6

As others have noted, GOTO does not cause a context switch, and are quite fast.

1) is this abusing of the async/await without an obvious reason, because the row data is already buffered in memory, right?

ADO.NET allows implementors a lot of leeway in how exactly they implement the base types. Perhaps the row is in memory, perhaps not.

2) is Task a pure overhead in this scenario?

Yes, if the operation is actually synchronous. Which is an implementation detail of your ADO.NET provider.

Note that the state machine and await add practically no overhead here; there's an async fast path where the code just keeps executing synchronously if possible.

3) would this approach actually have a worse performance compared to one without awaiting

Probably not. First off, the performance impact is not going to be driven by what little CPU work is done to call each method and continue executing synchronously. Any performance impact you'd see would be due to the additional Task<T> instances thrown on the Gen0 heap and having to be garbage collected. This is the reason there's now a ValueTask<T>.

But even that performance impact would most likely be unnoticeable next to the network I/O call to your database server. That said, if you want to know the micro-performance penalties, the Zen of Async is a classic.

Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
  • Channel 9 link is dead (and I can't find a Microsoft backup) but I think I found the talk on Youtube: https://www.youtube.com/watch?v=zjLWWz2YnyQ . The blog link is also dead but it can be found on archive.org [here](https://web.archive.org/web/20190126143814/https://blogs.msdn.microsoft.com/lucian/2011/04/15/async-ctp-refresh-design-changes/) – jrh Apr 07 '22 at 12:35