ValueTask method with unavoidable async await calls?

Question

I currently have the following async method:

private SomeObject _someObject = null;
public async Task<SomeObject> GetObjectAsync()
{
    await sslim.WaitAsync();
    if (_someObject == null)
    {
        _someObject = await InitializeSomeObjectAsync(); //starts calls to alot of async methods
    }
    sslim.Release();
    return _someObject;
}

If the above code is a hot path and called many times, is it safe/ok to change to use ValueTask?

private SomeObject _someObject = null;
public async ValueTask<SomeObject> GetObjectAsync()
{
    await sslim.WaitAsync();
    if (_someObject == null)
    {
        _someObject = await InitializeSomeObjectAsync(); //starts calls to a lot of async methods
    }
    sslim.Release();
    return _someObject;
}

What I'm unsure about is the sslim.WaitAsync locking call, which will always cause the code path to never be completely synchronous (even if _someObject has already been initialized), which is counter to using ValueTask for paths that can possible perform synchronously?

Another thought, maybe also changing the SemaphoreSlim call to the sync version would make sense?

private SomeObject _someObject = null;
public async ValueTask<SomeObject> GetObjectAsync()
{
    sslim.Wait();
    if (_someObject == null)
    {
        _someObject = await InitializeSomeObjectAsync(); //starts calls to a lot of async methods
    }
    sslim.Release();
    return _someObject;
}

I plan to perform some benchmarks on the above variations, but just wanted to get some feedback from people who are more knowledgeable as to which option would be good to consider.

What is your objective, to improve the performance or to reduce allocations? Btw I hope that in your actual code base you are calling the `sslim.Release();` in a `finally` block. — Theodor Zoulias, Aug 26 '21 at 04:45
Somewhat related: [Enforce an async method to be called once](https://stackoverflow.com/questions/28340177/enforce-an-async-method-to-be-called-once/) — Theodor Zoulias, Aug 26 '21 at 04:46

Stephen Cleary · Answer 1 · 2021-08-26T11:56:38.283

What I'm unsure about is the sslim.WaitAsync locking call, which will always cause the code path to never be completely synchronous

I'm not sure why that would be the case. Asynchronous methods may behave synchronously, and I would expect SemaphoreSlim.WaitAsync to synchronously acquire the semaphore if it's available.

which is counter to using ValueTask for paths that can possible perform synchronously?

Even if it completes asynchronously, using ValueTask<T> allows your code to avoid an allocation of a Task<T> for each invocation. If it completes synchronously, it's even more efficient, but you'll get some efficiency benefits even if it's always asynchronous. (see comments)

If it completes asynchronously, ValueTask<T> will have to do some allocations. These allocations can be pooled if you opt into it (DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS on .NET 5 or [AsyncMethodBuilder(typeof(PoolingAsyncValueTaskMethodBuilder))] on .NET 6).

I currently have the following async method

You may be interested in AsyncLazy<T> (with AsyncLazyFlags.RetryOnFailure). It uses Task<T>, but once the initialization completes successfully, it is allocation free (always returning the same Task<T> instance).

*"Even if it completes asynchronously, using `ValueTask` allows your code to avoid an allocation of a `Task` for each invocation."* Is this true? My understanding is that in case the `sslim.WaitAsync()` `Task` is not completed at the `await` point, the `GetObjectAsync()` will return a `ValueTask` that will wrap the incomplete `Task` returned by the semaphore. So what I would expect in this case is the usual allocation plus overhead. — Theodor Zoulias, Aug 26 '21 at 04:36
My answer is wrong there. The returned ValueTask *is* actually wrapping a Task from the state machine (not the one returned from `WaitAsync`). This is true unless the app opts into ValueTask pooling, in which case it could have (amortized) zero allocations. — Stephen Cleary, Aug 26 '21 at 11:52

Theodor Zoulias · Answer 2 · 2021-08-26T13:27:46.600

I made a DIY benchmark to measure the effect of switching from Task<T> to ValueTask<T>, regarding performance and allocations. As starting point I used the method below:

async Task<object> TaskOne()
{
    await Task.Yield();
    return new object();
}

I invoked and awaited this method continuously in a tight loop for one second, and then measured how many loops happened, and how many bytes were allocated in total. Then I did the same with a variant having ValueTask<object> as result, and finally I omitted the await Task.Yield(); line from both variants, to see how a synchronous completion would affect the measurements. Here is the complete benchmark:

using System;
using System.Threading;
using System.Threading.Tasks;

public static class Program
{
    static async Task Main()
    {
        await TestAsync("Using Task<object>", true, TaskLoop);
        await TestAsync("Using ValueTask<object>", true, ValueTaskLoop);
        await TestAsync("Using Task<object>", false, TaskLoop);
        await TestAsync("Using ValueTask<object>", false, ValueTaskLoop);
    }

    static async Task TestAsync(string title, bool asynchronous,
        Func<bool, CancellationToken, Task<int>> loop)
    {
        GC.Collect();
        long mem0 = GC.GetTotalAllocatedBytes(true);
        var cts = new CancellationTokenSource(1000);
        int count = await loop(asynchronous, cts.Token);
        long mem1 = GC.GetTotalAllocatedBytes(true);
        Console.WriteLine($"{title} - " + 
            (asynchronous ? "Asynchronous" : "Synchronous") + " completion");
        Console.WriteLine($"- Loops: {count:#,0}");
        Console.WriteLine($"- Allocations: {mem1 - mem0:#,0} bytes");
        double perLoop = (mem1 - mem0) / (double)count;
        Console.WriteLine($"- Allocations per loop: {perLoop:#,0} bytes");
        Console.WriteLine();
    }

    static async Task<object> TaskOne(bool asynchronous)
    {
        if (asynchronous) await Task.Yield();
        return new object();
    }

    static async ValueTask<object> ValueTaskOne(bool asynchronous)
    {
        if (asynchronous) await Task.Yield();
        return new object();
    }

    static async Task<int> TaskLoop(bool asynchronous, CancellationToken token)
    {
        int count = 0;
        while (!token.IsCancellationRequested)
        {
            var result = await TaskOne(asynchronous);
            count++;
            if (result == null) break; // Make sure that the result is not optimized out
        }
        return count;
    }

    static async Task<int> ValueTaskLoop(bool asynchronous, CancellationToken token)
    {
        int count = 0;
        while (!token.IsCancellationRequested)
        {
            var result = await ValueTaskOne(asynchronous);
            count++;
            if (result == null) break; // Make sure that the result is not optimized out
        }
        return count;
    }
}

Try it on Fiddle.

I got these results on my PC (.NET 5, C# 9, Release build, no debugger attached):

Using Task<object> - Asynchronous completion
- Loops: 448,628
- Allocations: 61,034,784 bytes
- Allocations per loop: 136 bytes

Using ValueTask<object> - Asynchronous completion
- Loops: 416,055
- Allocations: 59,919,520 bytes
- Allocations per loop: 144 bytes

Using Task<object> - Synchronous completion
- Loops: 8,450,945
- Allocations: 811,290,792 bytes
- Allocations per loop: 96 bytes

Using ValueTask<object> - Synchronous completion
- Loops: 8,806,701
- Allocations: 211,360,896 bytes
- Allocations per loop: 24 bytes

The results I got on the Fiddle server were a bit different. It is probably running on Debug build:

Using Task<object> - Asynchronous completion
- Loops: 667,918
- Allocations: 106,889,024 bytes
- Allocations per loop: 160 bytes

Using ValueTask<object> - Asynchronous completion
- Loops: 637,380
- Allocations: 107,084,176 bytes
- Allocations per loop: 168 bytes

Using Task<object> - Synchronous completion
- Loops: 10,128,652
- Allocations: 1,377,497,176 bytes
- Allocations per loop: 136 bytes

Using ValueTask<object> - Synchronous completion
- Loops: 9,850,096
- Allocations: 709,207,232 bytes
- Allocations per loop: 72 bytes

My conclusion is that switching from Task<T> to ValueTask<T> is quite advantageous when most of the invocations return completed tasks, and it is slightly disadvantageous if all of the invocations return incomplete tasks. For your specific use case (protecting the initialization of cached values) I think that it is worth making the switch, but don't expect massive performance gains from this. There are probably better ways to improve your caching mechanism, that offer not only better performance, but also less contention under heavy usage.

ValueTask method with unavoidable async await calls?

2 Answers2