0

I have implemented a small piece of asynchronous code and I face a weird behavior.

Basically what I want to do is run an initialization process for a set of multiple "clients" and I don't want to process them in a queue (some may take time to process, otherts may not). I just want them to be all finished in order to go to the next step. In order to avoid having too many processes running concurrently, I'm using a sempaphore (set temporarily to 2).

The problem I have is the execution seems to be done synchronously. Here a piece of the code (minified: no logging, try/catch, etc):

public void IntializeReportStructure(DateTimeOffset inReferenceDate)
{
    List<Task> theTaskCollection = new List<Task>();
    foreach (long theClientId in this.GetClientIdCollection())
    {
        Task theTask = this.InitializeClientAsync(theClientId, inReferenceDate.Year, inReferenceDate.Month);
        theTaskCollection.Add(theTask);
    }

    Task.WaitAll(theTaskCollection.ToArray());
}

private async Task InitializeClientAsync(long inClientId, int inReferenceYear, int inReferenceMonth)
{
    await this.Semaphore.WaitAsync();
    await this.InitializeClientReportAsync(inClientId);
    await this.InitializeClientSummaryAsync(inClientId, inReferenceYear, inReferenceMonth);
    this.Semaphore.Release();
}

Here's the logs content:

Waiting for semaphore for client 41. Current count is 2.
Entered semaphore for client 41. Current count is 1.
Semaphore has been released for client 41. Current count is 2.
Waiting for semaphore for client 12. Current count is 2.
Entered semaphore for client 12. Current count is 1.
Semaphore has been released for client 12. Current count is 2.
Waiting for semaphore for client 2. Current count is 2.
Entered semaphore for client 2. Current count is 1.
Semaphore has been released for client 2. Current count is 2.
Waiting for semaphore for client 261. Current count is 2.
Entered semaphore for client 261. Current count is 1.
Semaphore has been released for client 261. Current count is 2.
Waiting for semaphore for client 1. Current count is 2.
Entered semaphore for client 1. Current count is 1.
Semaphore has been released for client 1. Current count is 2.
Waiting for semaphore for client 6. Current count is 2.
Entered semaphore for client 6. Current count is 1.
Semaphore has been released for client 6. Current count is 2.

As you can see, each task is going to the semaphore step only when the previous one is finished. I said earlier the process was running asynchronously : within the loop (when adding the Task to the list), the task status is "RanToCompletion" and the value of IsCompleted is "true". I did not include the timestamp in the logs but we have some "client" that takes 15-20s to run and the process is "waiting" in the meantime.

Last detail, the two methods "InitializeClientReportAsync" and "InitializeClientSummaryAsync" are just doing async get data and async save data. Nothing weird over there.

The fun part is I am able to have an asynchronous result by adding "await Task.Delay(1);" right after the semaphore WaitAsync.

private async Task InitializeClientAsync(long inClientId, int inReferenceYear, int inReferenceMonth)
{
    await this.Semaphore.WaitAsync();
    await Task.Delay(1);
    await this.InitializeClientReportAsync(inClientId);
    await this.InitializeClientSummaryAsync(inClientId, inReferenceYear, inReferenceMonth);
    this.Semaphore.Release();
}

And here's the log content :

Waiting for semaphore for client 41. Current count is 2.
Waiting for semaphore for client 12. Current count is 1.
Waiting for semaphore for client 2. Current count is 0.
Waiting for semaphore for client 261. Current count is 0.
Waiting for semaphore for client 1. Current count is 0.
Waiting for semaphore for client 6. Current count is 0.
Entered semaphore for client 12. Current count is 0.
Entered semaphore for client 41. Current count is 0.
Semaphore has been released for client 12. Current count is 0.
Entered semaphore for client 2. Current count is 0.
Semaphore has been released for client 41. Current count is 0.
Entered semaphore for client 261. Current count is 0.
Semaphore has been released for client 261. Current count is 0.
Entered semaphore for client 1. Current count is 0.
Semaphore has been released for client 1. Current count is 0.
Entered semaphore for client 6. Current count is 0.
Semaphore has been released for client 2. Current count is 1.
Semaphore has been released for client 6. Current count is 2.

As you can see, processes are all going to the semaphore in the first place, before getting in. Aslo, we realize client "2" is the longest one to initialize. This is what I expect from the code in the first place.

I have no idea why a simple Task.Delay(1) changes the behavior of my process. It doesn't make much sense. Maybe I am missing something obvious because I'm too focused.

Edit

As mentionned in the comment, the problem comes from "async" Entity Framework code. I considered it as granted.

I simplified the content of underlying methods for the sake of the example. "IDbContext" is just an interface we use on the DbContext class of Entity Framework. In our case, async methods are directly the ones from the initial class.

private async Task InitializeClientAsync(long inClientId, int inReferenceYear, int inReferenceMonth)
{
    await this.Semaphore.WaitAsync();
    await this.SomeTest();
    this.Semaphore.Release();
}

private async Task SomeTest()
{
    using (IDbContext theContext = this.DataAccessProvider.CreateDbContext())
    {
        List<X> theQueryResult = await theContext.Set<X>().ToListAsync<X>();
        foreach (X theEntity in theQueryResult)
        {
            theEntity.SomeProperty = "someValue";
        }

        await theContext.SaveChangesAsync();
    }
}

So here, even if I retrieve hundreds of MB of entities and update them (all using await and async methods), everything stays completely synchronous. The next task will enter the semaphore only when this one will complete.

I added a Task.Delay(1) before SaveChangesAsync(), and the next Task enters the semaphore before the first task finishes. It confirms that everything is synchronous around here (except Task.Delay). But I can't say why though...

Tr00d
  • 17
  • 4
  • Task.Delay creates an extra task that will have to be awaited... and it is created inside the semaphore scope – Jonathan Alfaro Apr 03 '20 at 06:10
  • 4
    It would really help if you could provide a [mcve]. Without knowing what your other async methods are doing, or what the timing is, it's hard to tell what's going on. – Jon Skeet Apr 03 '20 at 06:11
  • Any task may complete **synchron** and f.i. `Semaphore.WaitAsync()` will do, when you do not have to wait for a semaphore slot https://github.com/Microsoft/referencesource/blob/master/mscorlib/system/threading/SemaphoreSlim.cs#L622 - and synchron code will block your loop code – Sir Rufo Apr 03 '20 at 06:14
  • @JonSkeet Sure, no problem. I will try to do this asap. Might be tricky though as the code inside is async EF. – Tr00d Apr 03 '20 at 06:16
  • BTW in which context is this code running? Console, Desktop (WPF, WinForms,...) or ASP.NET or ASP.NET core? – Sir Rufo Apr 03 '20 at 06:24
  • @SirRufo Ouch, I realize I forgot to mention the environment. Sorry about that. It's a .Net framework 4.6.2 console application – Tr00d Apr 03 '20 at 06:28
  • 2
    Unless the EF part is actually relevant to the problem, you should remove it. Isolate this to as simple code as possible. If you can't reproduce this *without* the EF code being involved, that may be the problem. – Jon Skeet Apr 03 '20 at 06:31
  • Indeed. It was also my next step. The more I look into it, the more I think the problem comes from this part as I can't see what's wrong in the code I showed up there. The behavior is basically like my EF code was purely synchronous, which would maybe explain why the Task.Delay changes the behavior. I will try and let you know – Tr00d Apr 03 '20 at 06:38
  • @Tr00d Can you post what `InitializeClientReportAsync` and `InitializeClientSummaryAsync` look like? The problem looks to me is they are actually synchronous. – weichch Apr 03 '20 at 07:39
  • I suggest you print out current thread id and see if they all run on the same thread which is likely the callers thread. – weichch Apr 03 '20 at 07:49
  • @weichch Will do shortly. But you are right : my "async" EF code is indeed synchronous :) – Tr00d Apr 03 '20 at 07:49
  • Also try to add an `Task.Yield()` to the begining of the `InitializeClientAsync` (even before waiting for the semaphore) this might help. – Ackdari Apr 03 '20 at 07:56
  • @Tr00d if the two methods do not have `await`, they will run synchronously event though you awaited on them. If that's the case, use `Task.Run` . – weichch Apr 03 '20 at 08:24
  • @weichch They do have the await keywork on AnyAsync, FirstOrDefaultAsync, ToListAsync or SaveChangesAsync. I provided a simplified example in the question itself. – Tr00d Apr 03 '20 at 08:36
  • You may want to checkout [this](https://stackoverflow.com/questions/36564596/how-to-limit-the-maximum-number-of-parallel-tasks-in-c-sharp) question on how to limit the number of concurrent tasks – Ackdari Apr 03 '20 at 08:46
  • @Ackdari Yep, I use the SemaphoreSlim already. But maybe I need to think about starting the task with a factory instead. I will take a look. Thanks! – Tr00d Apr 03 '20 at 08:56
  • @Tr00d also consider using just `Parallel.ForEach` like [this](https://stackoverflow.com/a/36564922/6560579) answer suggests – Ackdari Apr 03 '20 at 09:00
  • @Ackdari awaiting Task.Yield() seems to do the trick. I didn't know this one tbh. Thanks for the tip – Tr00d Apr 03 '20 at 09:13
  • You probably should call the method with `Task.Run` instead of using `Task.Yield` inside of it. The end-result is the same, but at least you leave the decision to the caller – Kevin Gosse Apr 03 '20 at 09:29

1 Answers1

0

Thanks to all your answers, I guess we were able have a closer view on what the problem was : not really an async-await issue, but more Entity Framework queries blocking the process.

I still don't fully understand why they don't run async though. My guess is they should... I will definitely investigate more on this.

Since the topic is now a bit different, I guess I can mark this question as answered.

Edit

Before closing the question, I'd like to give more detail about the remaining issue.

I made a small example with an EF6 async query (ToListAsync). Here, I expected to see in our logs "STARTED" first, shortly followed by "PENDING", and then "FINISHED" after the data has been retrieved.

private static async Task DoSomething()
{
    ILog theLogger = LogManager.GetLogger("test");
    using (Context theContext = new Context())
    {
        theLogger.Info("STARTING");
        Task<List<Transaction>> theTask = theContext.Set<Transaction>().ToListAsync();
        theLogger.Info("PENDING");
        var theResult = await theTask;
    }

    theLogger.Info("FINISHED");
}

2020-04-03 13:35:55,948 [1] INFO  test [(null)] - STARTING
2020-04-03 13:36:11,156 [1] INFO  test [(null)] - PENDING
2020-04-03 13:36:11,158 [1] INFO  test [(null)] - FINISHED

As you can see, "PENDING" happens after the data is retrieved (when all the work is done). So, the result is going to be the same whether you use async or not.

I tried the same example with a simple Task.Delay instead of the query.

private static async Task DoSomething()
{
    ILog theLogger = LogManager.GetLogger("test");
    using (Context theContext = new Context())
    {
        theLogger.Info("STARTING");
        Task theTask = Task.Delay(20000);
        theLogger.Info("PENDING");
        await theTask;
    }

    theLogger.Info("FINISHED");
}

2020-04-03 13:34:51,858 [1] INFO  test [(null)] - STARTING
2020-04-03 13:34:51,907 [1] INFO  test [(null)] - PENDING
2020-04-03 13:35:21,922 [5] INFO  test [(null)] - FINISHED

Here, everything is normal. The process is on hold only when meeting the await keyword.

Does anyone faced this behavior before? Is it normal?

Tr00d
  • 17
  • 4