0

Encountered interesting livelock situation that has to do with asynchrony.

Consider the code below that causes livelock and executes for 1 minute even though useful payload takes almost nothing to run. The reason for execution time to be around 1 minute is that we actually will hit thread pool grow limit (around 1 thread per second), so 300 iterations will make it run for around 5 minutes.

This is not trivial deadlock where we synchronously wait asynchronous operation in an environment with SyncronizationContext allowing scheduling jobs on a single thread only (e.g. WPF, WebAPI, etc). The code bellow reproduces an issue on Console Application where there is no explicit SynchronizationContext set and tasks are being scheduled on a thread pool.

I know that "solution" to this problem is "asynchrony all the way". In the real word we might not know that somewhere deep inside the developer of SyncMethod suppresses asynchrony via waiting it in a blocking way unleashing such issues (even if he might did the trick with replacing SynchronizationContext to make it not deadlock at least).

What are your suggestions to deal with such an issue when "asynchrony all the way" is not an option? Is there something else rather than obvious "do not spawn so many tasks at once"?

void Main()
{
    List<Task> tasks = new List<Task>();

    for (int i = 0; i < 60; i++)
        tasks.Add(Task.Run(() => SyncMethod()));

    bool exit = false;

    Task.WhenAll(tasks.ToArray()).ContinueWith(t => exit = true);

    while (!exit)
    {
        Print($"Thread count: {Process.GetCurrentProcess().Threads.Count}");
        Thread.Sleep(1000);
    }
}

void SyncMethod()
{
    SomethingAsync().Wait();
}

async Task SomethingAsync()
{
    await Task.Delay(1);
    await Task.Delay(1); // extra puzzle -- why commenting one of these Delay will partially resolve the issue?

    Print("async done");
}

void Print(object obj)
{
    $"[{Thread.CurrentThread.ManagedThreadId}] {DateTime.Now} - {obj}".Dump();
}

Here is an output. Notice how all async continuations stuck for almost a minute and then all the sudden continued execution.

[12] 30.01.2018 23:34:36 - Thread count: 18 
[12] 30.01.2018 23:34:37 - Thread count: 32
[12] 30.01.2018 23:34:38 - Thread count: 33 -- THREAD POOL STARTS TO GROW
...
[12] 30.01.2018 23:35:18 - Thread count: 70
[12] 30.01.2018 23:35:19 - Thread count: 71
[12] 30.01.2018 23:35:20 - Thread count: 72 -- UNTIL ALL SCHEDULED TASKS CAN FIT
[8] 30.01.2018 23:35:20 - async done -- ALMOST A MINUTE AFTER START
[8] 30.01.2018 23:35:20 - async done -- THE CONTINUATIONS START GO THROUGH
...
[61] 30.01.2018 23:35:20 - async done
[10] 30.01.2018 23:35:20 - async done
Eugene D. Gubenkov
  • 5,127
  • 6
  • 39
  • 71
  • 6
    The obvious solution is the best. Threads are prohibitively expensive; async was invented to help you not make so many threads. If you find that you're spending a lot of time hiring workers and not a lot of time getting them to do work, then hire fewer workers and assign each of them more work! – Eric Lippert Jan 30 '18 at 20:14
  • 3
    What are you expecting to get in an answer? You know what the problem is, and you've stated that you're unwilling to fix it. There isn't anything else for anyone to help you with. Either fix the code to *not* require a bunch of work of the thread pool threads to do nothing productive, or deal with the added costs that it entails. – Servy Jan 30 '18 at 20:17
  • 2
    You can increase the limits of the thread pool or use your own task scheduler that uses more threads. That does not avoid the resource costs but it makes threadpool exhaustion more predictable and avoidable. – usr Jan 30 '18 at 21:07
  • Thank you for your answers! I think it's helpful to at least have this case documented for someone's reference. – Eugene D. Gubenkov Jan 31 '18 at 13:39
  • @EricLippert, can you please explain why getting rid of _one_ of two `Task.Delay` within `SomethingAsync` partially solves the issue -- continuations slowly start to run even though Thread Pool did not grow enough to contain all tasks yet? – Eugene D. Gubenkov Feb 16 '18 at 20:21

1 Answers1

0

Answering the original question:

What are your suggestions to deal with such an issue when "asynchrony all the way" is not an option? Is there something else rather than obvious "do not spawn so many tasks at once"?

By no means a solution for the root cause, but a quantitative remedy - we can adjust Thread Pool using SetMinThreads increasing the amount of threads that will be created without a delay (so that way faster than regular "injection rate" which is on my setup 1 thread pool thread per second). The way it works in a given setup is simple. Basically we are wasting the Thread Pool threads until the pool grows big enough to start to execute the continuations. If we start with big enough pool we are basically eliminating the period of time where we just bound by the artificial "injection rate" which tries to keep amount of threads low (which makes sense, as thread pool is designed to run CPU-bound tasks instead of being blocked waiting asynchronous operation).

I should also leave a warning note.

By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorithm for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.

https://learn.microsoft.com/en-us/dotnet/api/system.threading.threadpool.setminthreads?view=netframework-4.8

There is also an interesting issue where Microsoft recommends increasing the "min threads" for ASP.NET as a performance/reliability improvement in some scenarios.

https://support.microsoft.com/en-us/help/821268/contention-poor-performance-and-deadlocks-when-you-make-calls-to-web-s

Interestingly, the problem described in the question is not purely imaginary. It is real. It happens with well-known and widely recognized software. Example from the experience -- Identity Server 3.

https://github.com/IdentityServer/IdentityServer3.EntityFramework/issues/101

The implementation that has this caveat (we had to rewrite it to work around the problem for our production scenario):

https://github.com/IdentityServer/IdentityServer3.EntityFramework/blob/master/Source/Core.EntityFramework/Serialization/ClientConverter.cs

Another article that explains the issue in details.

https://blogs.msdn.microsoft.com/vancem/2018/10/16/diagnosing-net-core-threadpool-starvation-with-perfview-why-my-service-is-not-saturating-all-cores-or-seems-to-stall/

As to the strange behavior for single Task.Delay where some async invocations are completed with each new injected Thread Pool thread. It seems to be caused by continuation execution inlining along with the way Task.Delay and Timer are implemented. See this call stack, it shows that newly created Thread Pool thread is doing some additional magic for .NET Timers when it's created, before processing Thread Pool queue (see System.Threading.TimerQueue.AppDomainTimerCallback).

   at AsynchronySamples.StrangeTimer.Program.d__2.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.InvokeMoveNext(Object stateMachine)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.c__DisplayClass4_0.b__0()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.ContinuationWrapper.Invoke()
   at System.Runtime.CompilerServices.TaskAwaiter.c__DisplayClass11_0.b__0()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.ContinuationWrapper.Invoke()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Threading.Tasks.Task.DelayPromise.Complete()
   at System.Threading.Tasks.Task.c.b__274_1(Object state)
   at System.Threading.TimerQueueTimer.CallCallbackInContext(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.TimerQueueTimer.CallCallback()
   at System.Threading.TimerQueueTimer.Fire()
   at System.Threading.TimerQueue.FireNextTimers()
   at System.Threading.TimerQueue.AppDomainTimerCallback(Int32 id)
   [Native to Managed Transition]   
   at kernel32.dll!74e86359()
   at kernel32.dll![Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]
   at ntdll.dll!77057b74()
   at ntdll.dll!77057b44()  

Eugene D. Gubenkov
  • 5,127
  • 6
  • 39
  • 71