-1

I'm seeing some odd behavioral differences when calling Task.WhenAll(IEnumerable<Task<T>>) and calling Task.WhenAll(List<Task<T>>) while trying to catch exceptions

My code is as follows:

public async Task Run()
{
    var en = GetResources(new []{"a","b","c","d"});
    await foreach (var item in en)
    {
        var res = item.Select(x => x.Id).ToArray();
        System.Console.WriteLine(string.Join("-> ", res));
    }
}

private async IAsyncEnumerable<IEnumerable<ResponseObj>> GetResources(
    IEnumerable<string> identifiers)
{
    IEnumerable<IEnumerable<string>> groupedIds = identifiers.Batch(2);
        // MoreLinq extension method -- batches IEnumerable<T>
        // into IEnumerable<IEnumerable<T>>
    foreach (var batch in groupedIds)
    {
        //GetHttpResource is simply a wrapper around HttpClient which
        //makes an Http request to an API endpoint with the given parameter
        var tasks = batch.Select(id => ac.GetHttpResourceAsync(id)).ToList();
            // if I remove this ToList(), the behavior changes
        var stats = tasks.Select(t => t.Status);
            // at this point the status being WaitingForActivation is reasonable
            // since I have not awaited yet
        IEnumerable<ResponseObj> res = null;
        var taskGroup = Task.WhenAll(tasks);
        try
        {
            res = await taskGroup;
            var awaitedStats = tasks.Select(t => t.Status);
                //this is the part that changes
                //if I have .ToList(), the statuses are RanToCompletion or Faulted
                //if I don't have .ToList(), the statuses are always WaitingForActivation
        }
        catch (Exception ex)
        {
            var exceptions = taskGroup.Exception.InnerException;
            DoSomethingWithExceptions(exceptions);
            res = tasks.Where(g => !g.IsFaulted).Select(t => t.Result);
                //throws an exception because all tasks are WaitingForActivation
        }
        yield return res;
    }
}

Ultimately, I have an IEnumerable of identifiers, I'm batching that into batches of 2 (hard coded in this example), and then running Task.WhenAll to run each batch of 2 at the same time.

What I want is if 1 of the 2 GetResource tasks fails, to still return the successful result of the other, and handle the exception (say, write it to a log).

If I run Task.WhenAll on a list of tasks, this works exactly how I want. However, if I remove the .ToList(), when I attempt to find my faulted tasks in the catch block after the await taskGroup, I run into problems because the statuses of my tasks are still WaitingForActivation although I believe they have been awaited.

When there is no exception thrown, the List and IEnumerable act the same way. This only starts causing issues when I try to catch exceptions.

What is the reasoning behind this behavior? The Task.WhenAll must have completed since I get into the catch block, however why are the statuses still WaitingForActivation? Have I failed to grasp something fundamental here?

istrupin
  • 1,423
  • 16
  • 32
  • 2
    `ToList` is starting tasks earlier... but it should not matter. Could you please review [MCVE] guidelines and [edit] post to clarify if `yield return` / `IAsyncEnumerable` is related to the problem? – Alexei Levenkov Apr 06 '20 at 19:26
  • 2
    "at this point the status is WaitingForActivation which is what I expect" - is in general wrong expectation for started tasks like shown in the code... Showing simplified `GetResourceAsync` would be nice. – Alexei Levenkov Apr 06 '20 at 19:28
  • 4
    Unless you make the list concrete (by using `ToList()`), each time you enumerate over the list you are calling `GetHttpResourceAsync` again, and creating a new task. This is due to the lazy evaluation. I would definitely keep the `ToList()` call when working with a list of tasks. – John Wu Apr 06 '20 at 20:16
  • 1
    John Wu's comment above explains completely your observations. As a side note, if you are batching your input with the intention of controlling the level of concurrency, a much better tool for the job is the [TPL Dataflow](https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library) library. Probably all you need is a [`TransformBlock`](https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.dataflow.transformblock-2) with `MaxDegreeOfParallelism = 2`. [Here](https://stackoverflow.com/a/59574072/11178549) is an example. – Theodor Zoulias Apr 06 '20 at 20:53
  • That makes sense -- at least I think I understand. If I'm reading that correctly, it's not that my task never completed, but actually the fact that by trying to check its status I created a new task instead. @TheodorZoulias TPL Dataflow is definitely something I'd like to consider but I want to combine the results from all of my tasks first and then send that combined result for processing. I am writing the results to a DB, and would like to write in fewer big batches rather than many small batches – istrupin Apr 06 '20 at 21:04
  • 1
    For getting all the results from a `TransformBlock` and saving them later to the DB, you probably don't need to stream the results back as an `IAsyncEnumerable`. A return value of `Task>` seems more suitable. [Here](https://stackoverflow.com/a/58751948/11178549) is a `ToListAsync` extension method for dataflow blocks that does exactly that. After getting the results, you could save them in batches with a [`BatchBlock`](https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.dataflow.batchblock-1) linked to an `ActionBlock`. – Theodor Zoulias Apr 06 '20 at 21:17
  • 1
    Can I kindly ask why the vote to close as off topic? Per the [on-topic page](https://stackoverflow.com/help/on-topic) I believe this question is reproducible, includes desired behavior, and is "a practical, answerable problem that is unique to software development" . If I misunderstood I apologize -- I would just like to understand how. – istrupin Apr 07 '20 at 00:29

1 Answers1

2

Unless you make the list concrete (by using ToList()), each time you enumerate over the list you are calling GetHttpResourceAsync again, and creating a new task. This is due to the deferred execution.

I would definitely keep the ToList() call when working with a list of tasks

John Wu
  • 50,556
  • 8
  • 44
  • 80