0

I have a loop that retrieves objects from a 3rd party API (hence why I have to ask for each object one at a time) and adds them to a list that I then return out of my procedure. It currently does these sequentially but I'd like the loop to be asynchronous to improve performance.

The basic code looks like this:

public async Task<List<ResponseObject<MyClass>>> GetMyClass(string[] references)
    {
        var responseObject = new ResponseObject<MyClass>();
        var responseObjects = new List<ResponseObject<MyClass>>();

            foreach (var reference in references)
            {
                responseObject = await GetExternalData(reference);
                responseObjects.Add(responseObject);
            }

        return responseObjects;
    }

The method I call is defined as this:

public async Task<ResponseObject<MyClass>> GetExternalData(string reference)

How do I need to change this to get it to turn the same list of ReponseObjects, having loaded them in parallel rather than sequentially? Any help would be gratefully received.

Duncan
  • 15
  • 4
  • 1
    A bit of of naming, as title of your question is misleading. In your code sample, code is _already_ asynchronous - control leaves your method on await keyword and comes back in future when task completes. Further in your post you correctly call what you want a _parallelisation_. Asynchronous != Parallel. – GrayCat May 05 '20 at 18:56
  • @GrayCat actually parallelization is also a wrong term for this case. The OP wants to achieve [concurrency](https://stackoverflow.com/questions/4844637/what-is-the-difference-between-concurrency-parallelism-and-asynchronous-methods), to have multiple operations running concurrently in other words. Unless the `GetExternalData` method is CPU-bound, which is highly unlikely. – Theodor Zoulias May 06 '20 at 00:37

1 Answers1

4

Try doing this with a .Select:

public async Task<List<ResponseObject<MyClass>>> GetMyClass(string[] references)
{
    var responseObject = new ResponseObject<MyClass>();
    var responseObjects = new ConcurrentBag<ResponseObject<MyClass>>();

    var tasks = references.Select(async item => 
    {
        var responseObject = await GetExternalData(item);
        responseObjects.Add(responseObject);
    });

    await Task.WhenAll(tasks);

    return responseObjects.ToList();
}

There's also an AsyncEnumerator NuGet package which includes a ParallelForEachAsync that works very similarly to Parallel.ForEach. I would recommend using that approach if references can be quite a few objects. As written above, this will spawn off as many tasks as there are references. Using the ParallelForEachAsync you can control the degree of parallelism to avoid too many requests, it would look like this:

public async Task<List<ResponseObject<MyClass>>> GetMyClass(string[] references)
{
    var responseObject = new ResponseObject<MyClass>();
    var responseObjects = new ConcurrentBag<ResponseObject<MyClass>>();

    await references.ParallelForEachAsync(async item =>
    {
        var responseObject = await GetExternalData(item);
        responseObjects.Add(responseObject);   
    }, maxDegreeOfParallelism: 8);

    return responseObjects.ToList();
}
Ron Beyer
  • 11,003
  • 1
  • 19
  • 37
  • This absolutely needs to be using a thread-safe data structure like ConcurrentBag. Alternatively return the tasks from GetExternalData in the select, await all the tasks and then synchronously add the task results to the list. – ScottyD0nt May 05 '20 at 17:04
  • @ScottyD0nt Yes, you're right, I changed it to be concurrent for thread safety. – Ron Beyer May 05 '20 at 17:08
  • 1
    It would be easier to *return* the objects from the `Select` lambda and then you get a collection of them from `WhenAll`. – Stephen Cleary May 05 '20 at 19:27
  • That's given me exactly what I wanted - thanks! I had missed the bit about using a ConcurrentBag in my feeble attempts to do it initially. – Duncan May 06 '20 at 09:41