2

I am trying to understand parallel programming and I would like my async methods to run on multiple threads. I have written something but it does not work like I thought it should.

Code

public static async Task Main(string[] args)
{
    var listAfterParallel =  RunParallel(); // Running this function to return tasks
    await Task.WhenAll(listAfterParallel); // I want the program exceution to stop until all tasks are returned or tasks are completed
    Console.WriteLine("After Parallel Loop"); // But currently when I run program, after parallel loop command is printed first
    Console.ReadLine();
}

public static async Task<ConcurrentBag<string>> RunParallel()
{
     var client = new System.Net.Http.HttpClient();
     client.DefaultRequestHeaders.Add("Accept", "application/json");
     client.BaseAddress = new Uri("https://jsonplaceholder.typicode.com");
     var list = new List<int>();
     var listResults = new ConcurrentBag<string>();
     for (int i = 1; i < 5; i++)
     {
       list.Add(i);
     }
     // Parallel for each branch to run await commands on multiple threads. 
     Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, async (index) =>
     {
         var response = await client.GetAsync("posts/" + index);
         var contents = await response.Content.ReadAsStringAsync();
         listResults.Add(contents);
         Console.WriteLine(contents);
     });
     return listResults;
}

I would like RunParallel function to complete before "After parallel loop" is printed. Also I want my get posts method to run on multiple threads.

Any help would be appreciated!

AbdelAziz AbdelLatef
  • 3,650
  • 6
  • 24
  • 52
Learn AspNet
  • 1,192
  • 3
  • 34
  • 74
  • 2
    Parallel is not the same thing as async. One is running stuff on multiple threads the other is not block a thread while it waits for something (often IO, but maybe for another thread to finish some work). If you want the IO to happen in parallel you just need to collect the tasks and get rid of the Parallel.ForEach – juharr Oct 04 '19 at 15:56
  • 3
    `async/await` are *not* meant for parallelism, they help with asynchronous operations. `Parallel.ForEach` is meant for data parallelism (crunching 100K/1M items locally) and is definitely not meant for async work. In fact, it *can't* await any async operations. This code will fire off *all requests* at the same time and never receive the results – Panagiotis Kanavos Oct 04 '19 at 15:56
  • 4
    In any case, asynchronous operations *already* run on another thread or don't bother their creator until they finish. You could use eg `var results = Task.WhenAll(Enumerable.Range(1,5).Select(i=>client.GetStringAsync($"posts/{i}")));` to fire off all 5 tasks and await their results without blocking – Panagiotis Kanavos Oct 04 '19 at 16:00
  • @PanagiotisKanavos will this run in multiple threads? – Learn AspNet Oct 04 '19 at 16:10
  • @PanagiotisKanavos What would you use instead of Parallel.ForEach? Can we use ActionBlock for data parallelism and also for async work? – Learn AspNet Oct 04 '19 at 16:15
  • @LearnAspNet Parallel.ForEach takes a lot of CPU to run. It is not recommended to use on the server-side it can simply put the machine down – OlegI Oct 04 '19 at 16:47
  • @PanagiotisKanavos asynchronous operations(I mean async/await) are not always running on separate threads. It depends on how much time it takes to perform the operation. But most of the time, yes, it is running on a separate thread – OlegI Oct 04 '19 at 16:50
  • @OlegI I want it to run in parallel, How can I make sure it runs on multiple cores if there are million records to be processed? – Learn AspNet Oct 04 '19 at 17:13

2 Answers2

1

What's happening here is that you're never waiting for the Parallel.ForEach block to complete - you're just returning the bag that it will eventually pump into. The reason for this is that because Parallel.ForEach expects Action delegates, you've created a lambda which returns void rather than Task. While async void methods are valid, they generally continue their work on a new thread and return to the caller as soon as they await a Task, and the Parallel.ForEach method therefore thinks the handler is done, even though it's kicked that remaining work off into a separate thread.

Instead, use a synchronous method here;

Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, index => 
{
    var response = client.GetAsync("posts/" + index).Result;

    var contents = response.Content.ReadAsStringAsync().Result;
    listResults.Add(contents);
    Console.WriteLine(contents);
});

If you absolutely must use await inside, Wrap it in Task.Run(...).GetAwaiter().GetResult();

Parallel.ForEach(list, new ParallelOptions() { MaxDegreeOfParallelism = 2 }, index => Task.Run(async () =>
{
    var response = await client.GetAsync("posts/" + index);

    var contents = await response.Content.ReadAsStringAsync();
    listResults.Add(contents);
    Console.WriteLine(contents);
}).GetAwaiter().GetResult();

In this case, however, Task.run generally goes to a new thread, so we've subverted most of the control of Parallel.ForEach; it's better to use async all the way down;

var tasks = list.Select(async (index) => {
        var response = await client.GetAsync("posts/" + index);

        var contents = await response.Content.ReadAsStringAsync();
        listResults.Add(contents);
        Console.WriteLine(contents);
    });
await Task.WhenAll(tasks);

Since Select expects a Func<T, TResult>, it will interpret an async lambda with no return as an async Task method instead of async void, and thus give us something we can explicitly await

David
  • 10,458
  • 1
  • 28
  • 40
  • I like your approach, but can you confirm if list.select function will run on multiple threads and will run parallel? I want to also pass an option of how many cores it can use – Learn AspNet Oct 04 '19 at 17:11
  • Running in parallel anytime you're using Tasks is something controlled by the Synchronization Context. I'll update to show a soluting using Select and AsParallel to make those same controls – David Oct 04 '19 at 17:13
  • Also, Can we use action blocks for async methods, because I also want to pass the number of cores/threads it can use – Learn AspNet Oct 04 '19 at 17:18
  • If you want to use Async methods but define maximum parallelization, you'll need to mess with your local TaskScheduler, which is a larger scope problem thanks to TaskScheduler being abstract – David Oct 04 '19 at 17:27
1

Take a look at this: There Is No Thread

When you are making multiple concurrent web requests it's not your CPU that is doing the hard work. It's the CPU of the web server that is serving your requests. Your CPU is doing nothing during this time. It's not in a special "Wait-state" or something. The hardware inside your box that is working is your network card, that writes data to your RAM. When the response is received then your CPU will be notified about the arrived data, so it can do something with them.

You need parallelism when you have heavy work to do inside your box, not when you want the heavy work to be done by the external world. From the point of view of your CPU, even your hard disk is part of the external world. So everything that applies to web requests, applies also to requests targeting filesystems and databases. These workloads are called I/O bound, to be distinguished from the so called CPU bound workloads.

For I/O bound workloads the tool offered by the .NET platform is the asynchronous Task. There are multiple APIs throughout the libraries that return Task objects. To achieve concurrency you typically start multiple tasks and then await them with Task.WhenAll. There are also more advanced tools like the TPL Dataflow library, that is build on top of Tasks. It offers capabilities like buffering, batching, configuring the maximum degree of concurrency, and much more.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104