Performance Metrics/Diagnostics of .NET Tasks

Question

Is there a way to get data out from .NET about the number of (C# 5 or newer, so post-async/await) Tasks pending execution, and similar metrics, for use in diagnosing issues happening to production servers?

The case I'm discussing is an async-all-the-way-down system (for example a massive parallel socket server where each request is run async from the start) where the initial Task either spawns multiple Tasks that each take time to process (or each start more Tasks), or spawns Tasks some of which block (like third party code) and some of which work async correctly. There are two situations I've seen which are difficult to diagnose usefully:

Under normal load, everything works fine, but if there are enough requests then the CPU very quickly jumps up to 100% and all requests finish slower and slower. When load lessens, CPU will stay at 100% until the majority of the pending tasks are finished gradually, and then the CPU drops down to normal levels.
Under normal load, everything works fine, but if there are enough requests then certain requests (all of which are properly async) no longer finish at all or very slowly. When the load lessens, CPU will stay at 100% while they're all processed, but there will be speed-bumps in the task completion rate where it slows down considerably for a short time.

I've tried writing up a simple test for this, but there's no apparent way to limit the number of executors and the number of tasks I need to create to test it makes it very hard to parse the information. It's also quite hard to not interfere with the test itself by attempting to log out debug information. I will continue attempting to create a better test case and amend my question if needed.

From my understanding of the problem and the async task system, both of these are effectively contention on the executors that actually run the tasks.

The first case happens because more Tasks are being created than are actually finishing, in which case a counter of pending tasks would be useful to diagnose this even before the load is high enough to lock up the service.

The second case happens because a certain set of Tasks goes long enough without yielding that over time (with enough load) all the executors end up running these tasks at the same time. And as soon as one finishes, it'll process some Tasks, only to be replaced by another long-running task soon enough. In this case, a pending tasks counter would be useful, as well as some other metrics.

Is anything of the sort available, or is there some undocumented/hacky way to graft a bit of code to the start/end of every task started in an application, to make it log out/measure these things and throw warnings when the task number is exploding?

For the first case you should be able to do it in your code. There has to be somewhere one big async call for each request (which in turns calls many methods and spans many tasks). You just need to monitor the task returned by that one big method to know how many requests are currently being processed by your system, and implement the throttling there. — Kevin Gosse, May 18 '18 at 06:18
@KevinGosse The code that actually does that one big async call is third party, so not modifiable. But more importantly, that would only tell me how many "initial tasks" are running. But sometimes that initial task would start only one other task, sometimes it might start a hundred tasks, because it depends on the request being processed. — Mike Roibu, May 19 '18 at 09:34
Programmers normally have to make a lot of effort to get everything they can get out of a processor, writing code that can execute truly concurrently is not so easy. Well, apparently not a problem you have to fix. Intentionally *not* using the processor to its full capacity is not exactly a feature, you could have spent a lot less money on the hardware. What you see it telling you that you have to spend more. That is entirely normal. Use a profiler to find inefficiencies, if any. — Hans Passant, May 20 '18 at 09:19
@HansPassant A general increase in CPU usage due to high request load is not what I mean. This is a situation that can max out any amount of hardware you throw at it, because of a bug in the code (requests spawning way too many tasks, or certain tasks blocking when they shouldn't). Diagnosing where this bug lies without having to wait for it to trigger in production and then profile it on production servers (because it requires a larger load than is feasible on development servers) is why I'm looking for performance metrics instead. Seeing the number of pending tasks jump up works on dev too. — Mike Roibu, May 20 '18 at 09:27
I wouldn't attempt this in production, but as far as undocumented/hacky ways, check out [this method](https://stackoverflow.com/a/36415711/442466) of replacing existing method implementations in memory during runtime. You might be able to mess around with Task methods using that. I've had mixed success with that in similar situations in the past. — S.C., May 21 '18 at 20:11
@S.C. Oh man, that really is hacky. Potentially it'd let us monitor things on a test machine at least. I'll see if I can figure something out with it, thanks. — Mike Roibu, May 22 '18 at 07:00
@MikeRoibu As long as you're going to run on a test environment have you thought about locally building a customized version of the open source [.Net Core](https://github.com/dotnet/core) and making those task mods more formally? — S.C., May 22 '18 at 07:07
@S.C. I considered this, but besides not being able to use .NET Core (the socket server runs our code by just taking it as a DLL and it very specifically uses .NET 4.5 internally) it feels like unless the production server does the same, it'll only be marginally helpful as a test environment. This is also why the hacky solution might work on a test machine (ie. not the test environment itself, but a second test environment) but only if the production server doesn't. — Mike Roibu, May 22 '18 at 13:10

Leonid Vasilev · Accepted Answer · 2018-05-24T08:58:11.673

You can inherit a class from EventListener to handle events that Task Parallel Library produces. Probably, you can count queued and running tasks and store profiling information associated with a task in a ConcurrentDictionary this way. However, there are complications such as non-uniqueness of task ids or performance implications of this profiling.

Example implementation:

public class TplEventListener : EventListener
{
    static readonly Guid _tplSourceGuid = new Guid("2e5dba47-a3d2-4d16-8ee0-6671ffdcd7b5");
    readonly EventLevel _handledEventsLevel;

    public TplEventListener(EventLevel handledEventsLevel)
    {
        _handledEventsLevel = handledEventsLevel;
    }

    protected override void OnEventSourceCreated(EventSource eventSource)
    {
        if (eventSource.Guid == _tplSourceGuid)
            EnableEvents(eventSource, _handledEventsLevel);
    }

    protected override void OnEventWritten(EventWrittenEventArgs eventData)
    {
        if (eventData.EventSource.Guid != _tplSourceGuid)
            return;

        switch (eventData.EventId)
        {
            // TODO: Add case for each relevant EventId (such as TASKSCHEDULED_ID and TASKWAITBEGIN_ID)
            // and explore relevant data (such as task Id) in eventData.Payload. Payload is described by 
            // eventData.PayloadNames.
            // For event ids and payload meaning explore TplEtwProvider source code 
            // (https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/TPLETWProvider.cs,183).
            default:
                var message = new StringBuilder();
                message.Append(eventData.EventName);
                message.Append("(");
                message.Append(eventData.EventId);
                message.Append(") { ");
                if (!string.IsNullOrEmpty(eventData.Message))
                {
                    message.Append("Message = \"");
                    message.AppendFormat(eventData.Message, eventData.Payload.ToArray());
                    message.Append("\", ");
                }
                for (var i = 0; i < eventData.Payload.Count; ++i)
                {
                    message.Append(eventData.PayloadNames[i]);
                    message.Append(" = ");
                    message.Append(eventData.Payload[i]);
                    message.Append(", ");
                }
                message[message.Length - 2] = ' ';
                message[message.Length - 1] = '}';
                Console.WriteLine(message);
                break;
        }
    }
}

Initialize and store new TplEventListener(EventLevel.LogAlways) in each AppDomain and you will get log similar to:

NewID(26) { TaskID = 1 }
TaskScheduled(7) { Message = "Task 1 scheduled to TaskScheduler 1.", OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, TaskID = 1, CreatingTaskID = 0, TaskCreationOptions = 8192 }
NewID(26) { TaskID = 2 }
TraceOperationBegin(14) { TaskID = 2, OperationName = Task.ContinueWith: < SendAsync > b__0, RelatedContext = 0 }
TaskStarted(8) { Message = "Task 1 executing.", OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, TaskID = 1 }
AwaitTaskContinuationScheduled(12) { OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, ContinuwWithTaskId = 2 }
NewID(26) { TaskID = 3 }
TraceOperationBegin(14) { TaskID = 3, OperationName = Async: < Main > d__3, RelatedContext = 0 }
NewID(26) { TaskID = 4 }
TaskWaitBegin(10) { Message = "Beginning wait (2) on Task 4.", OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, TaskID = 4, Behavior = 2, ContinueWithTaskID = 3 }
TaskWaitBegin(10) { Message = "Beginning wait (1) on Task 3.", OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, TaskID = 3, Behavior = 1, ContinueWithTaskID = 0 }
TraceSynchronousWorkBegin(17) { TaskID = 1, Work = 2 }
TraceSynchronousWorkEnd(18) { Work = 2 }
TraceOperationEnd(15) { TaskID = 1, Status = 1 }
RunningContinuation(20) { TaskID = 1, Object = 0 }
TaskCompleted(9) { Message = "Task 1 completed.", OriginatingTaskSchedulerID = 1, OriginatingTaskID = 0, TaskID = 1, IsExceptional = False }

For more information check:

Async Causality Chain Tracking article by Andrew Stasyuk
A Few Words on Task.Id (and TaskScheduler.Id) article by Stephen Cleary.
How do I listen to TPL TaskStarted/TaskCompleted ETW events discussion on StackOverflow
System.Threading.Tasks.TplEtwProvider source code

This seems exactly what I was looking for. Won't be able to test it before the bounty expires so I'll accept it now. Thanks! — Mike Roibu, May 25 '18 at 19:26

score 3 · Answer 2 · answered May 21 '18 at 19:59

In production environment Metrics.NET library is handy. You can instrument code and periodically write collected data to a local file or a database. In development environment you can use Visual Studio profiler to explore CPU and address space usage. See .NET Memory Allocation Profiling with Visual Studio 2012 article by Stephen Toub.

Relevant excerpt from Metrics.NET wiki:

The Metrics.NET library provides five types of metrics that can be recorded:

Meters record the rate at which an event occurs

Histograms measure the distribution of values in a stream of data

Timers keep a histogram of the duration of a type of event and a meter of the rate of its occurrence

Counters 64 bit integers that can be incremented or decremented

Gauges instantaneous values

And instrumentation example:

public class SampleMetrics
{
    private readonly Timer timer = Metric.Timer("Requests", Unit.Requests);
    private readonly Counter counter = Metric.Counter("ConcurrentRequests", Unit.Requests);

    public void Request(int i)
    {
        this.counter.Increment();
        using (this.timer.NewContext()) // measure until disposed
        {
            // do some work
        }
        this.counter.Decrement();
    }
}

For more information check:

Async Performance: Understanding the Costs of Async and Await article by Stephen Toub
async/await for high performance server applications? discussion on StackOverflow
Metrics, Metrics, Everywhere video by Coda Hale

This seems irrelevant to my question. I don't see a way to hook it into every task started, and there is no pending task metric or anything similar. If I were able to write code to add counters around every single task, then the library wouldn't be helpful as we would have our own metrics logger. — Mike Roibu, May 22 '18 at 06:58
Number of running tasks and number of executing method groups and lambdas must correlate because that is the only thing that tasks normally do. You can use decorators and interceptors to instrument code (similar to `ProfiledDbConnection` in MiniProfiler). — Leonid Vasilev, May 22 '18 at 08:20
Not the case if you have other code that uses method groups and lambdas, which most production code will have. Instrumentation won't help if it means going through every task to add it (even as a decorator) and there's no way to intercept task creation. I'm looking for a generic solution. If there isn't one, then we'll implement a custom solution on our own without needing a third party library. I do appreciate the input, but it is completely irrelevant as-is, since I very specifically asked for generic solutions. — Mike Roibu, May 22 '18 at 13:07

score 1 · Answer 3 · answered May 26 '18 at 11:06

Seems Leonid Vasilyev answer is enough for you, But i wanna share my experience when it comes to Tasks failing or taking longer than usual sometimes.

The biggest culprit here is ****Context Switching**** , The more tasks your fire up the more CPU has to keep track of the contexts. Believe me it's efficient for a CPU to do 100 heavy tasks than 100 light tasks.

For me the trick was to analyze the load patterns based on requests submitted (we use message queue) and keep a sweet spot based on the patterns.and I also did manual ForceGC collection at the end of every task.

I know you're looking for tools to help profiling , I thought this could help. I believe for these kind of issues it's good to focus on incoming traffic first instead of analyzing what are we doing with that traffic.

score 0 · Answer 4 · answered May 24 '18 at 17:08

The simplest thing that would work would be to create alternatives to Task.Run, Task.Start, etc. that 1) call the real run/start and 2) record information in a ConcurrentDictionary (e.g. the task itself, thread, etc.)

Effectively, you're writing a specialized profiler:

public static class TaskExtensions
{
    Task Run(
        Action action,
        [CallerMemberName] string caller = null,
        [CallerFilePath] string callerFile = null,
        [CallerLineNumber] int lineNumber = 0)
    {
        tasks.Add(new Entry { ... });
        return Task.Run(action);
    }

    // etc.
}

class Entry
{
    Task Task { get; set; }
    string CallerMemberName { get; set; }
    string CallerFilePath { get; set; }
    string CallerFileNumber { get; set; }
    int ThreadId { get; set; }
    DateTime Started { get; set; }
    DateTime Stopped { get; set; }
}

var tasks = new ConcurrentDictionary<string, Entry>();

We use the caller info to give us a unique key because task IDs aren't guaranteed to be unique.

In another thread or task, iterate over all the tasks checking their status (IsCompleted, IsFaulted, IsCanceled) and recording metrics such as how long they've been running, etc.

You will likely want a short delay between polls, so this would limit you to the duration of the delay in terms of precision of the metrics. Because the polls are running in their own task, your main-line code shouldn't be greatly impacted, and you should be able to get a picture of what's going on.

As an aside, since you mentioned sockets, you may be facing a situation where the sockets go into TIME_WAIT. When this happens you could experience the slow downs you're talking about. I've seen this before, and this definitely occurs under load.

To satisfy the need of not putting the metrics recording in production, use a compiler directive in your lower (test) environments around the metrics code and create a build configuration with that directive for these environments.

This seems well thought out, and thanks, however if you're using async/await most times you won't have a Task.Run, you just call the method marked async and the compiler handles everything. If not for that, it would help. The socket tip is also something we ran into before but that's easy to diagnose. — Mike Roibu, May 25 '18 at 19:18

score 0 · Answer 5 · answered May 25 '18 at 20:31

To my understanding if we can add some type of logging (DBlogging preferred) with the method name, datatime and sessionID before any await is called. Clear the record after the await statement, when the await is complete the record is removed (insert/update record with awaited time). So at real time the number of sessions awaiting for a method could be analyzed. I hope this works. I have tried using just txt files creating and deleting and it worked.

Performance Metrics/Diagnostics of .NET Tasks

5 Answers5