1

I have a method that converts a csv file into a particular model which I want to split up into multiple tasks as there's 700k+ records. I'm using .Skip and .Take in the method so each running of that method knows where to start and how many to take. I have a list of numbers 1-10 that I want to iterate over and create tasks to run this method using that iterator to create the tasks and do some math to determine how many records to skip.

Here's how I'm creating the tasks:

var numberOfTasksList = Enumerable.Range(1, 10).ToList();
//I left out the math to determine rowsPerTask used as a parameter in the below method for brevity's sake
var tasks = numberOfTasksList.Select(i
                =>  ReadRowsList<T>(props, fields, csv, context, zohoEntities, i, i*rowsPerTask, (i-1)*rowsPerTask));

           await Task.WhenAll(tasks);

The ReadRowsList method used looks like this (without the parameters):

public static async Task<string> ReadRowsList<T>(...parameters) where T : class, new()
   {
     //work to run
     return $"added rows for task {i}";
   }

That method's string that it returns is just a simple line that says $"added rows for task {i}" so it's not really a proper async/await as I'm just returning a string to say when that iteration is done.

However, when I run the program, the method waits for the first iteration (where i=1) to complete before starting the second iteration of running the program, so it's not running in parallel. I'm not the best when it comes to async/parallel programming, but is there something obvious going on that would cause the task to have to wait until the previous iteration finishes before the next task gets started? From my understanding, using the above code to create tasks and using .WhenAll(tasks) would create a new thread for each iteration, but I must be missing something.

jmath412
  • 419
  • 3
  • 13
  • 2
    Are you getting a warning about an `async` method lacking an `await` operator? – Theodor Zoulias Mar 10 '22 at 21:00
  • Note that `WhenAll` isn't going to run the tasks in parallel. It will result in starting the first one that will go until it hits an `await` where it will yield the thread and allow the next one to start and so on with the continuations after the first await and any additional awaits after that being handled as needed. – juharr Mar 10 '22 at 21:03
  • 1
    Btw have you considered using the `Parallel.ForEach` and the `Partitioner` class ([example 1](https://stackoverflow.com/questions/70839214/adding-two-arrays-in-parallel/70840104#70840104), [example 2](https://stackoverflow.com/questions/66736623/why-parallel-multithread-code-execution-is-slower-than-sequential/66737611#66737611)), instead of spawning tasks and partitioning your work manually? – Theodor Zoulias Mar 10 '22 at 21:04
  • @TheodorZoulias Yes, actually the ReadRowsList where the work is done gives that warning, I just didn't see the squiggles underneath the method name. – jmath412 Mar 10 '22 at 21:06
  • 1
    What you have is how you do async methods in parallel (Run the none async parts of the tasks while the async parts are waiting), but without any awaits it might as well just be sync code. – juharr Mar 10 '22 at 21:07
  • @juharr Ahhh, okay that makes sense. – jmath412 Mar 10 '22 at 21:08
  • 1
    @TheodorZoulias I had tried to use Parallel.ForEach on another part of the program, but it prevented me from seeing what was being read to the screen of our program as I'm using HangFire's PerformContext, but I can see if I can work around it for this portion. Thank you for the example. – jmath412 Mar 10 '22 at 21:21
  • 1
    Simple rule of thumb is to use Parallel.ForEach if your code is CPU bound (complex math computations) and async/await if your code is IO bound (File IO, DB calls, web service calls). – juharr Mar 10 '22 at 21:26
  • I'm able to get the Parallel.ForEach to run and I'm able to see which tasks the records are coming from, but it's running at the same speed as when it was just running sequentially. I figured having 10 processes running in parallel would significantly increase the speed of the csv conversion process. Is there something I'm missing where maybe everything is on the same thread or the individual tasks, even in parallel, are slowed down? I tried with just 5 processes and still the same speed. – jmath412 Mar 11 '22 at 00:07
  • @juharr If you want to add your explanations you gave in the first couple of comments as the answer I'll accept it. That explains why my code wasn't running as expected. The time between awaits was essentially the entire time to run one iteration of the method, essentially synchronous like you said. – jmath412 Mar 11 '22 at 00:11
  • jmath412 not all workloads are parallelizable. If your main work is to read data from the filesystem, then it depends on how your hardware storage device behaves with multiple readers. Some devices, like most classic hard disks, perform best with a single reader, and adding more readers slows them down (reduces the overall flow of data). In such cases the only way to increase the performance is to upgrade the hardware. – Theodor Zoulias Mar 11 '22 at 05:41

1 Answers1

4

In short:

  1. async does not equal multiple threads; and
  2. making a function async Task does not make it asynchronous

When Task.WhenAll is run with pretend async code that has no awaits the current thread cannot 'let go' of the task at hand and it cannot start processing another task.

As it was pointed out in the comments, the build chain warns you about it with: This async method lacks 'await' operators and will run synchronously. Consider using the 'await' operator to await non-blocking API calls, or 'await Task.Run(...)' to do CPU-bound work on a background thread.

Trivial example

Let's consider two function with identical signatures, one with async code and one without.

static async Task DoWorkPretendAsync(int taskId)
{
    Console.WriteLine($"Thread: {Thread.CurrentThread.ManagedThreadId} -> task:{taskId} > start");
    Thread.Sleep(TimeSpan.FromSeconds(1));
    Console.WriteLine($"Thread: {Thread.CurrentThread.ManagedThreadId} -> task:{taskId} > done");
}

static async Task DoWorkAsync(int taskId)
{
    Console.WriteLine($"Thread: {Thread.CurrentThread.ManagedThreadId} -> task:{taskId} > start");
    await Task.Delay(TimeSpan.FromSeconds(1));
    Console.WriteLine($"Thread: {Thread.CurrentThread.ManagedThreadId} -> task:{taskId} > done");
}

If we test them with the following snippet

await DoItAsync(DoWorkPretendAsync);
Console.WriteLine();
await DoItAsync(DoWorkAsync);

async Task DoItAsync(Func<int, Task> f)
{
    var tasks = Enumerable.Range(start: 0, count: 3).Select(i => f(i));
    Console.WriteLine("Before WhenAll");
    await Task.WhenAll(tasks);
    Console.WriteLine("After WhenAll");
}

we can see that with DoWorkPretendAsync the tasks are executed sequentially.

Before WhenAll
Thread: 1 -> task:0 > start
Thread: 1 -> task:0 > done
Thread: 1 -> task:1 > start
Thread: 1 -> task:1 > done
Thread: 1 -> task:2 > start
Thread: 1 -> task:2 > done
After WhenAll

Before WhenAll
Thread: 1 -> task:0 > start
Thread: 1 -> task:1 > start
Thread: 1 -> task:2 > start
Thread: 5 -> task:0 > done
Thread: 5 -> task:2 > done
Thread: 7 -> task:1 > done
After WhenAll

Things to note:

  • even with real async all tasks are started by the same thread;
  • in this particular run two of the task are finished by the same thread (id:5). This is not guaranteed at all - a task can be started on one thread and continue later on another thread in the pool.
tymtam
  • 31,798
  • 8
  • 86
  • 126