1

I've been trying to improve my understanding and usage of async code in C#, specifically how to integrate it into existing synchronous code.

I have the following test program, which is basically the test from https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/start-multiple-async-tasks-and-process-them-as-they-complete?pivots=dotnet-6-0 with a synchronous caller and a LinqPad runnable wrapper.

void Main()
{
    var a = new A();
    
    List<string> urls = new List<string>() 
        {
            "https://learn.microsoft.com/dotnet",
            "https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.whenall?view=net-6.0",
            "https://stackoverflow.com/questions/11836325/await-operator-can-only-be-used-within-an-async-method"
        };
        
    a.GetUrlContentLengths(urls).Dump();
}

public class A
{   
    public int GetUrlContentLengths(IEnumerable<string> urls)
    {
        return Task.Run<int>(async() => await GetUrlContentLengthsAsync(urls)).Result;
    }
    
public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
{
    System.Net.Http.HttpClient client = new System.Net.Http.HttpClient();

    IEnumerable<Task<int>> downloadTasksQuery = urls.Select(x => ProcessUrlAsync(x, client));

    var downloadTasks = downloadTasksQuery.ToList();

    int total = 0;
    
    while (downloadTasks.Any())
    {
        Task<int> finishedTask = await Task.WhenAny(downloadTasks);
        downloadTasks.Remove(finishedTask);
        total += await finishedTask;
    }
    
    return total;
}
        
    
    public  async Task<int> ProcessUrlAsync(string url, System.Net.Http.HttpClient client)
    {
        byte[] content = await client.GetByteArrayAsync(url);
        Console.WriteLine($"{url,-60} {content.Length,10:#,#}");

        return content.Length;
    }
}

This linked document describes the O(n²) issues like this:

What we’ve effectively created here is an O(N2) algorithm: for each task, we search the list for the task to remove it, which is an O(N) operation, and we register a continuation with each task, which is also an O(N) operation

So would this minor change to a Dictionary fix this and leave the whole thing as an O(n) operation?

public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
    {
        System.Net.Http.HttpClient client = new System.Net.Http.HttpClient();

        IEnumerable<Task<int>> downloadTasksQuery = urls.Select(x => ProcessUrlAsync(x, client));

        var downloadTasks = downloadTasksQuery.ToDictionary(xk => xk.GetHashCode(), xv => xv);

        int total = 0;
        
        while (downloadTasks.Any())
        {
            Task<int> finishedTask = await Task.WhenAny(downloadTasks.Values);
            downloadTasks.Remove(finishedTask.GetHashCode());
            total += await finishedTask;
        }
        
        return total;
    }
Whelkaholism
  • 1,551
  • 3
  • 18
  • 28
  • 3
    You could use `HashSet>` instead of Dictionary. – Peter Csala May 17 '22 at 09:06
  • No, this wouldn't make the operation `O(n)`. The dictionary values are just a collection of tasks, exactly the same as in your first example. The only difference with the dictionary is that you risk an exception, if more than one task generates the same hash code. – Johnathan Barclay May 17 '22 at 09:23
  • 2
    @JohnathanBarclay List.Remove is an O(n) operation while Dictionary.Remove is rather O(1). – Ralf May 17 '22 at 09:30
  • 1
    @Ralf Ah yes, but there is still `WhenAny` inside the `while` which is `O(n)`. – Johnathan Barclay May 17 '22 at 09:39
  • @Ralf's comment is the key one; the MS docs explicitly state that the list search is one of the O(n)s which is why I am questioning whether replacing it will fix it. The hash collision issue is a good point from Jonathan, I didn't consider that – Whelkaholism May 17 '22 at 13:46
  • 2
    Just take the task as key. The hash thing then and potential collision are then a Dictionary magic you don't have to deal with. – Ralf May 17 '22 at 13:52
  • @Ralf I just came back to say I'd found out that the Dictionary magically does that :) I guess it'd increase the complexity away from O(n) the more collisions, but still better than a list? – Whelkaholism May 17 '22 at 14:01
  • If there is a hash collision with multiple task the dictionary needs to call equals on those Tasks then to find the correct one. But becoming a relevant factor here your count of Tasks needs to reach somewhere around Int.MaxValue. I'm sure you hit other problems way before. – Ralf May 17 '22 at 14:06
  • 2
    [Jon Skeet](https://codeblog.jonskeet.uk/2012/01/16/eduasync-part-19-ordering-by-completion-ahead-of-time/), [Stephen Toub](https://devblogs.microsoft.com/pfxteam/processing-tasks-as-they-complete/), and [myself](https://github.com/StephenCleary/AsyncEx/blob/0361015459938f2eb8f3c1ad1021d19ee01c93a4/src/Nito.AsyncEx.Tasks/TaskExtensions.cs#L179-L260) all have non-O(N^2) solutions. But as [Theodor points out](https://stackoverflow.com/a/72276921/263693), avoiding the "do x as each task completes" *entirely* results in cleaner and more maintainable (as well as more efficient) code. – Stephen Cleary May 18 '22 at 00:58
  • @StephenCleary Stephen Toub's post was one of my reading materials actually. In my case though, I don't care what order they complete. I just need to combine the results of each when everything has finished. Linguistically I would actually expect to be able to use `WhenAll` for this but I'm still reading about that and I can't work out how to get the results! – Whelkaholism May 19 '22 at 08:52

1 Answers1

2

So would this minor change to a Dictionary fix this and leave the whole thing as an O(n) operation?

No. Searching a List<T> is indeed a O(n) operation, but eliminating this operation does not eliminate all O(n) operations that are happening inside the while loop. There is one more O(n) operation hidden inside the Task.WhenAny method, which has a far greater impact (overhead) at slowing down your code than searching in the list. The hidden operation is the attaching of continuations on all incomplete tasks in the downloadTasks collection, and then detaching these continuations when any of the tasks completes. That's a lot of work to do, because it involves memory allocations and synchronization overhead, and the only way to avoid it is to avoid using the WhenAny-in-a-loop antipattern. Here is an alternative O(n) implementation of your algorithm. It's O(n) because only one continuation is attached on each task, by the Task.WhenAll method:

public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
{
    HttpClient client = new();

    int total = 0;

    Task<int>[] higherOrderTasks = urls.Select(async url =>
    {
        int result = await ProcessUrlAsync(url, client).ConfigureAwait(false);
        Interlocked.Add(ref total, result);
        return result;
    }).ToArray();

    await Task.WhenAll(higherOrderTasks);

    return total;
}

A higher order task is created for each ProcessUrlAsync task, that wraps this task and incorporates the code that should run when the task completes. The continuations after the await ProcessUrlAsync might run concurrently to each other, so you might have to synchronize the access to any shared state that you might have to mutate, like the total variable in the above example. Unless you know for sure that your code will run on a SynchronizationContext that will synchronize the continuations, in which case you should also remove the .ConfigureAwait(false). In this specific case it is actually possible to get rid of the higher order tasks and the shared state altogether, like this:

public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
{
    HttpClient client = new();

    Task<int>[] tasks = urls
        .Select(url => ProcessUrlAsync(url, client))
        .ToArray();

    int[] results = await Task.WhenAll(tasks);

    return results.Sum();
}
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • 1
    Ah okay, so the O(n2) is caused by the `while` loop PLUS any O(n) within it? That's what I was missing, thanks. Your solution is beyond my current understanding of async programming but I will study it and do some reading! Marked as the answer because it does indeed answer the original question. – Whelkaholism May 19 '22 at 08:44
  • 1
    @Whelkaholism the term "higher order task" might sound too deep and scary, and in fact it's the first time that I've used this term. It's another way of saying that async/await is composable: you can write an async method that `await`s another async method. By doing so you you can incorporate bits of functionality that are missing from the inner async method, that are specific to the problem that you are trying to solve. – Theodor Zoulias May 19 '22 at 08:50