Debugging a failing, unobserved TPL task

Question

I am attempting to debug an intermittent issue I am observing via a failing integration test, but seem to be stuck between a rock and a hard place.

Something, somewhere, is creating a System.Threading.Tasks.Task that is subsequently failing and resulting in an unobserved task exception. In the resultant handler, I can see the task ID and the exception that caused it to fail. I have painstakingly analyzed my code and even followed my own advice and named all my tasks but have still not found the offending task. It doesn't seem like my code is creating it at all.

So I tried setting breakpoints on the Task constructors themselves. I can do this with function breakpoints (with locations such as "System.Threading.Tasks.Task.Task(System.Action)"). This works and the debugger breaks and shows me the assembly for the Task class. However, what I really need to do is correlate the ID of the Task to the ID of the Task that ends up failing.

So to that end, I try to output the Task.Id property in a trace point. But I get a message saying the method is optimized so the expression cannot be evaluated.

So I tried debugging with source. I set up all my symbols and what-have-you, but I tried absolutely everything I could, but it isn't working. After much googling I found that it isn't supported on the latest .NET 4.

Does anyone have any ideas on how I can proceed to diagnose this issue?

Can you set a condition on the function breakpoints that checks your task names instead of IDs? Also maybe breaking on `ContinueWith` instead of the `Task` constructors would help to narrow down the set of possible candidates. — anton.burger, Nov 20 '12 at 11:31
@shambulator: the task in question does not have a name because I haven't been able to track down where it is created in order to name it. That's the crux of the problem. — Kent Boogaart, Nov 20 '12 at 12:16
But if I understand the other comments correctly, you know which code it's running? What's preventing you from finding the sites which refer to that code (in order to construct tasks from it)? — anton.burger, Nov 20 '12 at 12:22
@shambulator: that's exactly what I've done and, yes, you would think that would result in a solution and fix. But it hasn't. Hence, my dilemma. Like I said, it seems that the task isn't even being created by my code, which sounds crazy, but is the only rational explanation I have so far. — Kent Boogaart, Nov 20 '12 at 13:22
Instead of a tracepoint, what about outputting the task ID using `Debug.WriteLine` or similar, along with anything else relevant, like the current stack? Is there a chance that a bug in your task-naming code is preventing some tasks from having the state reliably associated with them? If you're sure you've got debug output in place every time a task is created with that action, and you're still not seeing its ID come up, then there's something *really* weird at work. — anton.burger, Nov 20 '12 at 15:29

Kent Boogaart · Answer 1 · 2012-11-20T15:56:28.520

OK, I have tracked down this problem. The specifics of the bug are perhaps less interesting than the method I used to find it, but I will cover both in separate sections below.

The Problem

Here is part of the offending code:

private static Task<TSuccessor> ThenImpl<TAntecedent, TSuccessor>(Task<TAntecedent> antecedent, Func<Task<TAntecedent>, Task<TSuccessor>> getSuccessor, CancellationToken cancellationToken, TaskThenOptions options)
{
    antecedent.AssertNotNull("antecedent");
    getSuccessor.AssertNotNull("getSuccessor");

    var taskCompletionSource = new TaskCompletionSource<TSuccessor>();

    antecedent.ContinueWith(
        delegate
        {
            var evenOnFaulted = options.HasFlag(TaskThenOptions.EvenOnFaulted);
            var evenOnCanceled = options.HasFlag(TaskThenOptions.EvenOnCanceled);

            if (antecedent.IsFaulted && !evenOnFaulted)
            {
                taskCompletionSource.TrySetException(antecedent.Exception.InnerExceptions));
            }
            else if ((antecedent.IsCanceled || cancellationToken.IsCancellationRequested) && !evenOnCanceled)
            {
                taskCompletionSource.TrySetCanceled();
            }
            else
            {

This method supports my Then extension methods, which I've blogged about.

Further to the implementation in my blog post, I recently added the ability to run the "then continuation", as I call it, even if the antecedent task faults:

Task.Factory.StartNew(() => { throw new InvalidOperationException(); })
    .Then(() => Console.WriteLine("Executed"), TaskThenOptions.EvenOnFaulted);

This will result in the initial exception being ignored and "Executed" being output on the console. However, the problem is my ThenImpl does not observe the original exception. To that end, I changed this line:

if (antecedent.IsFaulted && !evenOnFaulted)

to this:

if (antecedent.Exception != null && !evenOnFaulted)

And now I don't get the problem.

Now, you may wonder why this was at all difficult to track down. The thing is, I have a lot of task compositional methods that facilitate advanced scenarios. Here's an actual snippet to give you some idea of the resultant power:

private Task OnConnectAsync(CancellationToken cancellationToken, object state)
{
    var firstAttempt = true;
    var retryOnFailureTask = TaskUtil
        .RetryOnFailure(
                () => TaskUtil.Delay(firstAttempt ? TimeSpan.Zero : this.reconnectDelay, cancellationToken)
                .Then(
                    x =>
                    {
                        if (!firstAttempt)
                        {
                            Interlocked.Increment(ref this.connectionAttempts);
                        }

                        firstAttempt = false;
                    })
                .Then(x => this.loggerService.Debug("Attempting to connect communications service (attempt #{0}).", this.connectionAttempts), cancellationToken)
                .Then(x => this.communicationsService.ConnectAsync(cancellationToken), cancellationToken)
                .Then(x => this.loggerService.Debug("Successfully connected communications service (attempt #{0}).", this.connectionAttempts), cancellationToken)
                .Then(x => this.communicationsService.AuthenticateAsync(cancellationToken), cancellationToken)
                .Then(x => this.loggerService.Debug("Successfully authenticated communications service (attempt #{0}).", this.connectionAttempts), cancellationToken)
                .Then(x => this.ReviveActiveStreamsAsync(cancellationToken), cancellationToken)
                .Then(x => this.loggerService.Debug("Successfully revived streams (attempt #{0}).", this.connectionAttempts), cancellationToken),
            null,
            cancellationToken);

    return retryOnFailureTask;
}

Note the custom RetryOnFailure, Then, and Delay methods. That's a good taste of what I'm talking about.

The drawback of this, of course, is tracking down problems when they occur. I can't help but feel the TPL does a poor job in this regard. To my mind, each Task should include information about who created it. At the very least, there should be hooks in the TPL (eg. TaskCreated event) so that developers can supplement tasks with their own debug information. The situation may well have improved with .NET 4.5 - I am using .NET 4.0 though.

The Method

The key to tracking down the problem was in laboriously wrapping each Task that I created with a TaskCompletionSource that wraps any exception with a supplementary message. For example, here is a ToBooleanTask extension method that I have beforehand:

public static Task<bool> ToBooleanTask(this Task task)
{
    var taskCompletionSource = new TaskCompletionSource<bool>();

    task.ContinueWith(
        x =>
        {
            if (x.IsFaulted)
            {
                taskCompletionSource.TrySetException(x.Exception.GetBaseException());
            }
            else if (x.IsCanceled)
            {
                taskCompletionSource.TrySetCanceled();
            }
            else
            {
                taskCompletionSource.TrySetResult(true);
            }
        });

    return taskCompletionSource.Task;
}

And here it is after making this change:

public static Task<bool> ToBooleanTask(this Task task)
{
    var taskCompletionSource = new TaskCompletionSource<bool>();

    task.ContinueWith(
        x =>
        {
            if (x.IsFaulted)
            {
                taskCompletionSource.TrySetException(new InvalidOperationException("Failure in to boolean task", x.Exception.GetBaseException()));
            }
            else if (x.IsCanceled)
            {
                taskCompletionSource.TrySetCanceled();
            }
            else
            {
                taskCompletionSource.TrySetResult(true);
            }
        });

    return taskCompletionSource.Task;
}

In this case I already have a TaskCompletionSource, so it was straightforward. In other cases, I had to explicitly create a TaskCompletionSource and forward any fault/cancel/result from the underlying Task onto the TaskCompletionSource.

Aside: you may wonder at the use of the ToBooleanTask extension method. It's incredibly useful where you want to implement a single method that handles both generic and non-generic tasks. You can implement the generic version and then have your non-generic overload call ToBooleanTask in order to create a generic task, which can then be passed onto your generic overload.

Once I had gone through all the possible culprits and supplemented them as per above, I re-ran my test until it failed and noticed that it was, indeed ToBooleanTask that was creating the task that was not being observed. Therefore, I modified it to:

public static Task<bool> ToBooleanTask(this Task task)
{
    var stackTrace = new System.Diagnostics.StackTrace(true);
    var taskCompletionSource = new TaskCompletionSource<bool>();

    task.ContinueWith(
        x =>
        {
            if (x.IsFaulted)
            {
                taskCompletionSource.TrySetException(new InvalidOperationException("Failure in to boolean task with stack trace: " + stackTrace, x.Exception.GetBaseException()));
            }
            else if (x.IsCanceled)
            {
                taskCompletionSource.TrySetCanceled();
            }
            else
            {
                taskCompletionSource.TrySetResult(true);
            }
        });

    return taskCompletionSource.Task;
}

This would give me a full stack trace when the failure occurred. I re-ran my test until it failed, and - hooray! - got the information I needed to track down the issue:

Failure in to boolean task with stack trace:    at XXX.Utility.Tasks.TaskExtensions.ToBooleanTask(Task task) in C:\XXX\Src\Utility\Tasks\TaskExtensions.cs:line 110
   at XXX.Utility.Tasks.TaskExtensions.Then(Task antecedent, Func`2 getSuccessor, CancellationToken cancellationToken, TaskThenOptions options) in C:\XXX\Src\Utility\Tasks\TaskExtensions.cs:line 199
   at XXX.Utility.Tasks.StateMachineTaskFactory`1.TransitionTo(T endTransitionState, CancellationToken cancellationToken, WaitForTransitionCallback`1 waitForTransitionCallback, ValidateTransitionCallback`1 validateTransitionCallback, PreTransitionCallback`1 preTransitionCallback, Object state) in C:\XXX\Src\Utility\Tasks\StateMachineTaskFactory.cs:line 312
   <snip>

So I could see it was one of my Then overloads calling ToBooleanTask. I could then trace through that exact code and the problem became evident very quickly.

This got me curious, though. Why hadn't my original approach of supplementing each task with a name produced any results? I tried reverting my fix, directly naming the task produced by ToBooleanTask, and re-running until I got a failure. Sure enough, I saw the task name in my debugger. So clearly I had somehow missed naming this task originally.

Phew!

score 1 · Answer 2 · answered Nov 19 '12 at 15:00

1

If the number of tasks is manageable, you can use the "Make Object ID" feature in Visual Studio to track each Task:

In your breakpoint in the task constructor, put the task into the Watch window.
Right mouse click on the task in the watch window and choose "Make Object ID". Notice that this puts a 1# at the end of the value. Do this for each task.
Do your workflow. In the task that throws the exception check what number it has.

answered Nov 19 '12 at 15:00

Matt Smith

17,026
7
53
103

Good suggestion: +1. Unfortunately, the extra time taken to perform these steps is causing the test to pass. I tried automating it with a macro, but couldn't get that to work either. – Kent Boogaart Nov 20 '12 at 09:40
1

Alright, when you track it down, be sure to report back--I'm curious. – Matt Smith Nov 20 '12 at 14:23
OK, found the problem and added an answer describing it. – Kent Boogaart Nov 20 '12 at 15:40

score 0 · Answer 3 · answered Nov 19 '12 at 11:37

0

Break in the UnobservedTaskException event and examine the private fields of the Task. You can find the Task one or two levels up in the call stack because the event is being raised by the TaskExceptionHolder class which contains a private field m_task.

The Task object will contain the action that ran as part of its execution.

answered Nov 19 '12 at 11:37

usr

168,620
35
240
369

Hi. I think you misunderstood the problem. I have access to the `Task` already. What I'm trying to do is figure out who *created* the `Task`. BTW, the Task is simply the sender of the UnobservedTaskException event - no need to go digging through the object hierarchy to find it. – Kent Boogaart Nov 19 '12 at 11:43
Ok so what prevents you from finding out who created the task? Is the action field empty or inconclusive? – usr Nov 19 '12 at 11:45
1

The action represents the body of the task. It has nothing to do with who created the task. I know where in my code the action resides, but am not able to pin down why the exception it raises isn't being handled (as far as I can tell, it is always handled with a continuation). Hence, I need to pin down who is creating the failing task so I can ensure that they are handling failures. – Kent Boogaart Nov 19 '12 at 11:51

score 0 · Answer 4 · answered Nov 20 '12 at 14:22

0

If possible, you could change your code that creates Tasks to use the Task Constructor that takes an object: Task(Action<Object>, Object)

Then, at each location where you create the task you could pass it something unique (identifying integer, the call stack, etc.)

Then, in the UnobservedTaskException you could examine this date (stored in Task.AsyncState).

This would help you narrow down whether it is your Task or some other Task.

answered Nov 20 '12 at 14:22

Matt Smith

17,026
7
53
103

Thanks, but associating state with the task is exactly what my naming approach achieves, except it's even more flexible. So given that I named all the tasks I could possibly find and still the faulting task didn't have a name...associating state in the manner you describe won't yield any different results. – Kent Boogaart Nov 20 '12 at 14:38
Almost--but with your solution there is a race condition: The task could fail *before* the naming code executes (because the naming code executes after the task is already started). Also, note that your naming code is not thread safe--in its current form, it could only be used safely if your extension code is always run from the same (typically main) thread. Of course, you could easily fix this latter problem by using a ConcurrentDictionary. – Matt Smith Nov 20 '12 at 15:44
actually, my code already uses `ConcurrentDictionary`. I changed that during this whole debacle, just in case it was related to the problem. Just haven't updated my blog. Also, I don't there's a race condition. Even if the task fails straight away, the UnobservedTaskFailure isn't raised until the task is finalized, which can't have happened if something is still referencing it, such as the method in which it is created (and subsequently named). – Kent Boogaart Nov 20 '12 at 15:53
Good point (about the race condition). As long as you're not relying on the name being there until after your extension code has run, then yes it will be fine. Presumably its only for debug purposes (i.e. you wouldn't look for the name within the Task execution itself). Glad to hear you tracked down the problem. – Matt Smith Nov 20 '12 at 16:03

Debugging a failing, unobserved TPL task

4 Answers4

The Problem

The Method