Asynchronously saving to multiple tables in a single SQL Server Db

Question

I was able to populate multiple tables of a database in parallel using Task.WhenAll(tasks). I have a generic method for each table:

var tasks = new List<Task>()
        {
            CheckNewItems<S, T>(async () => await GetPrimaryKeys<S, T>(),
            CheckNewItems<S, T>(async () => await GetPrimaryKeys<S, T>(),
            CheckNewItems<S, T>(async () => await GetPrimaryKeys<S,T>(),                                                                     
            CheckNewItems<S, T>(async () => await GetPrimaryKeys<S,T>()
        };

        await Task.WhenAll(tasks);

Now a new requirement came that asks me to save the number of new items from that async method, and I stumbled into this post. I tried to do it the way its mentioned in the comments on the accepted answer, to get the number of new items:

   int newItems = 0;
   newItems += await CheckNewItems<S, T>(async () => await GetPrimaryKeys<S, T>();
   newItems += await CheckNewItems<S, T>(async () => await GetPrimaryKeys<S, T>();
   newItems += await CheckNewItems<S, T>(async () => await GetPrimaryKeys<S,T>();                                                                     
   newItems += await CheckNewItems<S, T>(async () => await GetPrimaryKeys<S,T>();

Using the second method, the database tables do not update in parallel but synchronously.

Inside that CheckNewItem method, I'm using the SaveChangesAsync and AddRangeAsync methods in EF Core. I'm not sure why using Task.WhenAll(tasks) does the desired action and doing it the second way doesn't when in the post some of the comments mentioned that you don't need to do Task.WhenAll to make sure async methods run in parallel.

I would like to get the results of each call to CheckNewItems, while they can still save to the database asynchronously and in parallel as usual. Thanks in advance for the insights and help :)

score 1 · Answer 1 · answered Jan 29 '20 at 01:12

How about:

var tasks = new List<Task>()
        {
            CheckNewItems<S, T>(async () => await GetPrimaryKeys<S, T>(),
            CheckNewItems<S, T>(async () => await GetPrimaryKeys<S, T>(),
            CheckNewItems<S, T>(async () => await GetPrimaryKeys<S,T>(),                                                                     
            CheckNewItems<S, T>(async () => await GetPrimaryKeys<S,T>()
        };

await Task.WhenAll(tasks);
int newItems = 0;
foreach(var task in tasks){
     newItems+= await task;
}

Jorge Santiago · Answer 2 · 2020-01-29T01:42:53.827

await does what it says, waits until the asynchronous task is finished before continuing with the method execution, Task.WhenAll on the other hand runs every task in parallel and returns when the last one is done.

Adding stuff in parallel isn't exactly easy, because it requires an atomic operation. Luckily it is relatively straightforward with APIs like Interlocked.Add().

https://www.dotnetperls.com/interlocked

Edit: Kristjan answer's is a much simpler option, as you'll add the count after every task is done, just use Task.Result instead of awaiting in the foreach loop, because there's no need to await an already completed task just to retrieve its value (awaiting a task it's a pretty expensive operation)

score 0 · Accepted Answer · answered Jan 29 '20 at 02:09

I don't know why this works as desired:

        var task1 = CheckNewItems<S, T>(async () => await GetPrimaryKeys<S, T>();
        var task2 = CheckNewItems<S, T>(async () => await GetPrimaryKeys<S, T>();
        var task3 = CheckNewItems<S, T>(async () => await GetPrimaryKeys<S,T>();                                                                     
        var task4 = CheckNewItems<S, T>(async () => await GetPrimaryKeys<S,T>();

        newItems += await task1;
        newItems += await task2;
        newItems += await task3;
        newItems += await task4;

And this one doesn't:

  newItems += await CheckNewItems<S, T>(async () => await GetPrimaryKeys<S, T>();
  newItems += await CheckNewItems<S, T>(async () => await GetPrimaryKeys<S, T>();
  newItems += await CheckNewItems<S, T>(async () => await GetPrimaryKeys<S,T>();                                                                     
  newItems += await CheckNewItems<S, T>(async () => await GetPrimaryKeys<S,T>();

Anyway, I would be very grateful to anyone who can explain why. I think I need to read more about the TPL.

The first case is almost the same as in my answer, but not using a list. when you store the result in a task (var task1/2/3/4=...) it becomes hot (it may start running). So when you get to newItems += await task1, all 4 tasks have potentially started running in parallel. Then you wait for the result of each of them and use the result. While you are awaiting task1, all 4 tasks are being executed, so you get the parallelism. In the second case however, you dont store to any var, so in the first line you await for the first task to complete, before going to the second one (and so on) — kkica, Jan 29 '20 at 12:09

Asynchronously saving to multiple tables in a single SQL Server Db

3 Answers3