2

This little console app is a proof of concept for making thousands of database calls. The idea is that we want many calls to be happing simultaneously. No need to wait for one call to finish before starting the next.

At first, this (see below) seemed like a good approach, but when we applied this to the actual database calls, what we're seeing is that it seems to stack the processes. Meaning, it Starts all of them, but none of them Finish until all of them have been started.

I would expect (and want) some of the calls to be finishing before others have started. But that doesn't seem to be the case.

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("starting");
        DatabaseCallsAsync().Wait();
        Console.WriteLine("ending"); // Must not fire until all database calls are complete.
        Console.Read();
    }

    static async Task DatabaseCallsAsync()
    {
        List<int> inputParameters = new List<int>();
        for (int i = 0; i < 100; i++)
        {
            inputParameters.Add(i);
        }
        await Task.WhenAll(inputParameters.Select(x => DatabaseCallAsync($"Task {x}")));
    }

    static async Task DatabaseCallAsync(string taskName)
    {
        Console.WriteLine($"{taskName}: start");
        await Task.Delay(1000);
        Console.WriteLine($"{taskName}: finish");
    }
}

How can this be adjusted so that some calls finish without waiting for all of them to start?

Casey Crookston
  • 13,016
  • 24
  • 107
  • 193
  • How are you measuring when they are finishing? Just from `Console.WriteLine($"{taskName}: finish")`? – Gabriel Luci Aug 16 '19 at 15:36
  • Starting an async task that doesn't involve another thread (or the OS tracking I/O completion) has no means of starting to execute until you `await` the task. Awaiting actually returns from the method and allows other messages to be pumped, including those that enter the tasks you created. When a "resume" message is handled upon completion of the awaited task(s), it calls back into the method and resumes after the await. That's why they're "stacking". – madreflection Aug 16 '19 at 15:37
  • @GabrielLuci, in the real application, we call a stored proc which returns an integer. Once we have that integer, it's done. But regardless, this little PoC console app behaves the same way. Even if you shorten the `await Task.Delay(1000)` to 100 instead of 1000, none of them finish until all of them have started. – Casey Crookston Aug 16 '19 at 15:39
  • @madreflection... ok, thank you. I *mostly* understood that. Based on your comment, I'll play around. But, if you could suggest an alternate approach in the meantime, I'd be grateful! – Casey Crookston Aug 16 '19 at 15:41
  • @madreflection That's not correct. A `Task` starts as soon as it's created. You don't need to `await` it before it starts. This can be seen if you did `var tasks = inputParameters.Select(x => DatabaseCallAsync($"Task {x}")).ToList();` and *not* await them. All the tasks will still run to completion. – Gabriel Luci Aug 16 '19 at 15:41
  • I should maybe mention.. when I execute the SP in question via SSMS, it finishes in 00:00:00. And when I make an asynchronous call to it in C#, with a stopwatch around it, it makes the call in milliseconds. So this is not a long, slow SP. – Casey Crookston Aug 16 '19 at 15:44
  • @GabrielLuci: No, you're right. I was trying to put more emphasis on the fact that they're not background threads the way it's being done here. It *starts* and runs up until the next await, and if *that* task is not complete, it returns, and so on. Without an await, a method would run synchronously, and the compiler even warns you of that. – madreflection Aug 16 '19 at 15:46
  • @CaseyCrookston: I *just* noticed that you're not reifying your sequence of tasks. Put `ToList()` after the `Select` and they *should* at least hit the first `Console.WriteLine` call. In retrospect, that may be the difference between what you expected and what you observed. – madreflection Aug 16 '19 at 15:47
  • @madreflection, that for sure worked in the console PoC app! Lemme go try that for real in the actual app. Thanks! – Casey Crookston Aug 16 '19 at 15:50
  • @madreflection - no change in the real application. I think Krur may be onto something. The PoC app works as expected. The difference is the actual database call. For that I'm using Dapper's QueryAsync. Something about that call seems to not release the process as expected. I think maybe this entire post might be a red-hearing. – Casey Crookston Aug 16 '19 at 16:08
  • I made a followup post with some of the code from the actual application, and not just this Console app proof of concept: https://stackoverflow.com/questions/57528561/trying-to-make-thousands-of-database-calls-simultaneously-they-stack-instead-o – Casey Crookston Aug 16 '19 at 17:03

3 Answers3

4

It's important to note a couple things:

  1. async methods start running synchronously. The magic happens at await, if await is given an incomplete Task.
  2. Asynchronous != parallel. Running something asynchronously just lets the thread go and do something else while it is waiting for a reply from somewhere. It doesn't mean that multiple things are happening at the same time.

With those things in mind, this is what's happening in your case when it loops through all the tasks you've created:

  1. All of the tasks are put on the "to do" list.
  2. Task 1 is started.
  3. At await, an incomplete Task is returned, and the rest of the method is put on the "to do" list.
  4. The thread realizes there is nothing to do and moves on to the next thing on the "to do" list, which is to start the next Task.

At step 4, the next thing on the "to do" list will always be the next Task in the list until there is nothing left in the list. Only then, the next thing on the "to do" list is the continuation of the tasks that have completed, in the order they completed.

All of this happens on the same thread: it is asynchronous, not parallel.

But! If you actually use SQL calls (and you make a new connection for each task, since a single connection can only run one query at a time - unless you enable Multiple Active Result Sets) and monitor SQL, you will see those calls coming in and likely finishing before all of them have started, because SQL runs queries in parallel. It's only that the continuation of the C# methods won't start until all the tasks have started.

If you are truly looking to run these in parallel, then you need multi-threading. You can look at Parallel.ForEach (examples here), but that is not asynchronous. It will create a thread for each instance and the thread will block until it's complete. That's not a big deal in a desktop app, but in ASP.NET, threads are finite, so you need to be careful.

There is a big discussion of this here, but I particularly like this answer, which is not multi-threading, but gives a way to throttle your tasks. So you can tell it to start x number of tasks, and as each task finishes, start the next until all of them have run. For your code, that would look something like this (running 10 tasks at a time):

static async Task DatabaseCallsAsync()
{
    List<int> inputParameters = new List<int>();
    for (int i = 0; i < 100; i++)
    {
        inputParameters.Add(i);
    }

    await RunWithMaxDegreeOfConcurrency(10, inputParameters, x => DatabaseCallAsync($"Task {x}"));
}

static async Task DatabaseCallAsync(string taskName)
{
    Console.WriteLine($"{taskName}: start");
    await Task.Delay(1000);
    Console.WriteLine($"{taskName}: finish");
}

public static async Task RunWithMaxDegreeOfConcurrency<T>(
     int maxDegreeOfConcurrency, IEnumerable<T> collection, Func<T, Task> taskFactory)
{
    var activeTasks = new List<Task>(maxDegreeOfConcurrency);
    foreach (var task in collection.Select(taskFactory))
    {
        activeTasks.Add(task);
        if (activeTasks.Count == maxDegreeOfConcurrency)
        {
            await Task.WhenAny(activeTasks.ToArray());
            //observe exceptions here
            activeTasks.RemoveAll(t => t.IsCompleted); 
        }
    }
    await Task.WhenAll(activeTasks.ToArray()).ContinueWith(t => 
    {
        //observe exceptions in a manner consistent with the above   
    });
}
Gabriel Luci
  • 38,328
  • 4
  • 55
  • 84
  • Phew. Wow. Thank you! I'm reading and re-reading until I understand. Then I'll test and play around :) – Casey Crookston Aug 16 '19 at 16:23
  • It is hard to wrap your head around it, at first :) – Gabriel Luci Aug 16 '19 at 16:34
  • @Gabrial, this does indeed work for the console app. The 11th task finishes after the first 10 start. And that will adjust accordingly as I play with maxDegreeOfConcurrency. Thank you! However, I am still getting the same kind of results with the real database calls. So while THIS seems to be working, I am still doing something wrong with the actual database calls. I am going to start a new post for that. And if you don't mind taking a look, I'll add a comment here with a link to it when it's up. – Casey Crookston Aug 16 '19 at 16:35
  • Are you using the same SQL connection (or same `DbContext`) in each iteration? – Gabriel Luci Aug 16 '19 at 16:37
  • 1
    If you enable `MultipleActiveResultSets` on the connection, surely you could have multiple queries on the connection, or else you couldn't have multiple active result sets, right? (MARS and `NextResult` are unrelated, in case anyone was wondering about that.) That leads me to wonder how many active result sets it could really handle. – madreflection Aug 16 '19 at 16:38
  • @madreflection That sounds like that would work! I didn't know that was a thing. I'll update my answer. – Gabriel Luci Aug 16 '19 at 16:40
  • I am creating and killing the SQL connection inside of DatabaseCallAsync() – Casey Crookston Aug 16 '19 at 16:41
  • I made a followup post with some of the code from the actual application, and not just this Console app proof of concept: https://stackoverflow.com/questions/57528561/trying-to-make-thousands-of-database-calls-simultaneously-they-stack-instead-o – Casey Crookston Aug 16 '19 at 17:03
  • I just realized that `RunWithMaxDegreeOfConcurrency` returns a single task, not a list, so there's no need to use `Task.WhenAll` (but it won't make any difference to how it runs). – Gabriel Luci Aug 16 '19 at 17:21
1

Your delay is to high to get the expected result. All Requests are started before the first one is finished. This depends on your System.

await Task.Delay(50);

Worked for me.

Krur Schak
  • 47
  • 5
  • Yeah, I think you are on to something here, at least for the console Proof of Concept app. But in the real app, we are making 4,151 database calls. And all 4,151 of them start before a single one finishes. And the call to the database only takes less than one second. So something is off in my code. I'm going to try what madreflection suggested in the comments and add a .ToList() after the select. – Casey Crookston Aug 16 '19 at 15:52
  • Thats right. It depends on your SQL Server. I have to search for the right answer on my own but i know that not all SQL Server versions can handle Tasks async. For MySQL you can test it with MySQL Workbench und for MSSQL with SQL Management Studio. Just open some Tabs and put some requests in. Some long running and some that run faster and look what happens. But your Console Test App does what you expect. – Krur Schak Aug 16 '19 at 16:02
  • ok, so it's not the code, or the way I've structured the async calls. It's the database call itself that is somehow not releasing the process. – Casey Crookston Aug 16 '19 at 16:06
  • I tried this with `await Task.Delay(1)` and they all still started before any of them finished. – Gabriel Luci Aug 16 '19 at 16:19
-2

To get the effect i think you're after i used:

    static async Task DatabaseCallsAsync()
    {
        List<Task> inputParameters = new List<Task>();
        Parallel.For(0, 100, (i) => inputParameters.Add(DatabaseCallAsync($"Task {i}")));
        await Task.WhenAll(inputParameters);
    }

    static async Task DatabaseCallAsync(string taskName)
    {
        Console.WriteLine($"{taskName}: start");
        await Task.Delay(50);
        Console.WriteLine($"{taskName}: finish");
    }

BUT If you're gonna make 4151 requests then you should probably extend parallel.for with ParallelOptions

For a non Parallel approach i used:

    static async Task DatabaseCallsAsync()
    {
        List<Task> inputParameters = new List<Task>();
        for (int i = 0; i < 100; i++)
        {
            inputParameters.Add(DatabaseCallAsync($"Task {i}"));
        }
        await Task.WhenAll(inputParameters);
    }

However, doing it this way i had to set Task.Delay to 10 because starting the work for 100 was done faster than 50 milliseconds(Initial Task.Delay)

D. Dahlberg
  • 156
  • 1
  • 9
  • `Parallel.For` is for **CPU bound** work. Database call is **I/O bound** work. Do not waste tasks for async work. – Artur Aug 16 '19 at 16:12
  • Now, when I look at your code again, the only thing you are doing inside `Parallel.For` is adding task into collection that is absolutely useless. – Artur Aug 16 '19 at 16:21
  • @Artur Adding the task to the list will start the task. But all the tasks will still be started before any of them finish. – Gabriel Luci Aug 16 '19 at 16:23