0

I have a method in which I'm retrieving a list of deployments. For each deployment I want to retrieve an associated release. Because all calls are made to an external API, I now have a foreach-loop in which those calls are made.

public static async Task<List<Deployment>> GetDeployments()
{
    try
    {
        var depjson     = await GetJson($"{BASEURL}release/deployments?deploymentStatus=succeeded&definitionId=2&definitionEnvironmentId=5&minStartedTime={MinDateTime}");
        var deployments = (JsonConvert.DeserializeObject<DeploymentWrapper>(depjson))?.Value?.OrderByDescending(x => x.DeployedOn)?.ToList();

        foreach (var deployment in deployments)
        {
            var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
            deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
        }

        return deployments;
    }
    catch (Exception)
    {
        throw;
    }
}

This all works perfectly fine. However, I do not like the await in the foreach-loop at all. I also believe this is not considered good practice. I just don't see how to refactor this so the calls are made parallel, because the result of each call is used to set a property of the deployment.

I would appreciate any suggestions on how to make this method faster and, whenever possible, avoid the await-ing in the foreach-loop.

ZiNNED
  • 2,620
  • 20
  • 31
  • 1
    You could use [`Parallel.Foreach()`](https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/how-to-write-a-simple-parallel-foreach-loop) or use [PLinq](https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/how-to-create-and-execute-a-simple-plinq-query) or create a task foreach `GetJson()` and wait them all see: [Use Task.WaitAll() to handle awaited tasks?](https://stackoverflow.com/questions/19849847/use-task-waitall-to-handle-awaited-tasks) – Jeroen van Langen Sep 25 '18 at 09:09
  • 2
    Or create a `List>`, add each call to the api then `await Task.WhenAll(tasks)` http://gigi.nullneuron.net/gigilabs/avoid-await-in-foreach/ – Ric Sep 25 '18 at 09:12
  • This is a job for *TPL DataFlow*, yehaa – TheGeneral Sep 25 '18 at 09:16
  • @J.vanLangen Thanks, although I have read that `Parallel.ForEach` is recommended CPU intensive tasks and not as much for IO related tasks. As @Ric mentioned, I could indeed put each call to the API in a `List>` and then map the correct Release by id later. I will look into the TPL DataFlow @Saruman mentioned as well. Thanks! – ZiNNED Sep 25 '18 at 09:25

3 Answers3

1

There is nothing wrong with what you are doing now. But there is a way to call all tasks at once instead of waiting for a single task, then processing it and then waiting for another one.

This is how you can turn this:

wait for one -> process -> wait for one -> process ...

into

wait for all -> process -> done

Convert this:

foreach (var deployment in deployments)
{
    var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
    deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
}

To:

var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}"));
var reljsons = await Task.WhenAll(deplTasks);
for(var index = 0; index < deployments.Count; index++)
{
    deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]);
}

First you take a list of unfinished tasks. Then you await it and you get a collection of results (reljson's). Then you have to deserialize them and assign to Release.

By using await Task.WhenAll() you wait for all the tasks at the same time, so you should see a performance boost from that.

Let me know if there are typos, I didn't compile this code.

FCin
  • 3,804
  • 4
  • 20
  • 49
  • Thanks! I went with this approach with some minor tweaks on the result. Instead of looping through all Deployments, I created a list of Releases and then mapped the right Release with the right Deployment by its Id. – ZiNNED Sep 25 '18 at 09:47
  • @ZiNNED I'm glad it worked. I'm not proud of this `for` loop myself, so I'm glad you figured out a nicer way. – FCin Sep 25 '18 at 09:48
1

Fcin suggested to start all Tasks, await for them all to finish and then start deserializing the fetched data.

However, if the first Task is already finished, but the second task not, and internally the second task is awaiting, the first task could already start deserializing. This would shorten the time that your process is idly waiting.

So instead of:

var deplTasks = deployments.Select(d => GetJson($"{BASEURL}release/releases/{d.ReleaseId}"));
var reljsons = await Task.WhenAll(deplTasks);
for(var index = 0; index < deployments.Count; index++)
{
    deployments[index].Release = JsonConvert.DeserializeObject<Release>(reljsons[index]);
}

I'd suggest the following slight change:

// async fetch the Release data of Deployment:
private async Task<Release> FetchReleaseDataAsync(Deployment deployment)
{
    var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
    return JsonConvert.DeserializeObject<Release>(reljson);
}

// async fill the Release data of Deployment:
private async Task FillReleaseDataAsync(Deployment deployment)
{
    deployment.Release = await FetchReleaseDataAsync(deployment);
}

Then your procedure is similar to the solution that Fcin suggested:

IEnumerable<Task> tasksFillDeploymentWithReleaseData = deployments.
    .Select(deployment => FillReleaseDataAsync(deployment)
    .ToList();
await Task.WhenAll(tasksFillDeploymentWithReleaseData);

Now if the first task has to wait while fetching the release data, the 2nd task begins and the third etc. If the first task already finished fetching the release data, but the other tasks are awaiting for their release data, the first task starts already deserializing it and assigns the result to deployment.Release, after which the first task is complete.

If for instance the 7th task got its data, but the 2nd task is still waiting, the 7th task can deserialize and assign the data to deployment.Release. Task 7 is completed.

This continues until all tasks are completed. Using this method there is less waiting time because as soon as one task has its data it is scheduled to start deserializing

Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116
-1

If i understand you right and you want to make the var reljson = await GetJson parralel:

Try this:

Parallel.ForEach(deployments, (deployment) =>
{
    var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
    deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
});

you might limit the number of parallel executions such as:

Parallel.ForEach(
    deployments,
    new ParallelOptions { MaxDegreeOfParallelism = 4 },
    (deployment) =>
{
    var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
    deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
});

you might also want to be able to break the loop:

Parallel.ForEach(deployments, (deployment, state) =>
{
    var reljson = await GetJson($"{BASEURL}release/releases/{deployment.ReleaseId}");
    deployment.Release = JsonConvert.DeserializeObject<Release>(reljson);
    if (noFurtherProcessingRequired) state.Break();
});
julian bechtold
  • 1,875
  • 2
  • 19
  • 49
  • 1
    `Parallel.For/Foreach` is the wrong tool for *IO bound* work, you lock resources up waiting for IO completion ports, `TPL Dataflow` is better suited for this – TheGeneral Sep 25 '18 at 09:15
  • Thanks for the suggestion, although as @Saruman already mentioned I also thought `Parallel.ForEach` should be used primarily on CPU intensive tasks and not so much for IO tasks. – ZiNNED Sep 25 '18 at 09:26
  • Parallel.ForEach does not support await. – usr Sep 25 '18 at 09:32