OK, I have tracked down this problem. The specifics of the bug are perhaps less interesting than the method I used to find it, but I will cover both in separate sections below.
The Problem
Here is part of the offending code:
private static Task<TSuccessor> ThenImpl<TAntecedent, TSuccessor>(Task<TAntecedent> antecedent, Func<Task<TAntecedent>, Task<TSuccessor>> getSuccessor, CancellationToken cancellationToken, TaskThenOptions options)
{
antecedent.AssertNotNull("antecedent");
getSuccessor.AssertNotNull("getSuccessor");
var taskCompletionSource = new TaskCompletionSource<TSuccessor>();
antecedent.ContinueWith(
delegate
{
var evenOnFaulted = options.HasFlag(TaskThenOptions.EvenOnFaulted);
var evenOnCanceled = options.HasFlag(TaskThenOptions.EvenOnCanceled);
if (antecedent.IsFaulted && !evenOnFaulted)
{
taskCompletionSource.TrySetException(antecedent.Exception.InnerExceptions));
}
else if ((antecedent.IsCanceled || cancellationToken.IsCancellationRequested) && !evenOnCanceled)
{
taskCompletionSource.TrySetCanceled();
}
else
{
This method supports my Then
extension methods, which I've blogged about.
Further to the implementation in my blog post, I recently added the ability to run the "then continuation", as I call it, even if the antecedent task faults:
Task.Factory.StartNew(() => { throw new InvalidOperationException(); })
.Then(() => Console.WriteLine("Executed"), TaskThenOptions.EvenOnFaulted);
This will result in the initial exception being ignored and "Executed" being output on the console. However, the problem is my ThenImpl
does not observe the original exception. To that end, I changed this line:
if (antecedent.IsFaulted && !evenOnFaulted)
to this:
if (antecedent.Exception != null && !evenOnFaulted)
And now I don't get the problem.
Now, you may wonder why this was at all difficult to track down. The thing is, I have a lot of task compositional methods that facilitate advanced scenarios. Here's an actual snippet to give you some idea of the resultant power:
private Task OnConnectAsync(CancellationToken cancellationToken, object state)
{
var firstAttempt = true;
var retryOnFailureTask = TaskUtil
.RetryOnFailure(
() => TaskUtil.Delay(firstAttempt ? TimeSpan.Zero : this.reconnectDelay, cancellationToken)
.Then(
x =>
{
if (!firstAttempt)
{
Interlocked.Increment(ref this.connectionAttempts);
}
firstAttempt = false;
})
.Then(x => this.loggerService.Debug("Attempting to connect communications service (attempt #{0}).", this.connectionAttempts), cancellationToken)
.Then(x => this.communicationsService.ConnectAsync(cancellationToken), cancellationToken)
.Then(x => this.loggerService.Debug("Successfully connected communications service (attempt #{0}).", this.connectionAttempts), cancellationToken)
.Then(x => this.communicationsService.AuthenticateAsync(cancellationToken), cancellationToken)
.Then(x => this.loggerService.Debug("Successfully authenticated communications service (attempt #{0}).", this.connectionAttempts), cancellationToken)
.Then(x => this.ReviveActiveStreamsAsync(cancellationToken), cancellationToken)
.Then(x => this.loggerService.Debug("Successfully revived streams (attempt #{0}).", this.connectionAttempts), cancellationToken),
null,
cancellationToken);
return retryOnFailureTask;
}
Note the custom RetryOnFailure
, Then
, and Delay
methods. That's a good taste of what I'm talking about.
The drawback of this, of course, is tracking down problems when they occur. I can't help but feel the TPL does a poor job in this regard. To my mind, each Task
should include information about who created it. At the very least, there should be hooks in the TPL (eg. TaskCreated
event) so that developers can supplement tasks with their own debug information. The situation may well have improved with .NET 4.5 - I am using .NET 4.0 though.
The Method
The key to tracking down the problem was in laboriously wrapping each Task
that I created with a TaskCompletionSource
that wraps any exception with a supplementary message. For example, here is a ToBooleanTask
extension method that I have beforehand:
public static Task<bool> ToBooleanTask(this Task task)
{
var taskCompletionSource = new TaskCompletionSource<bool>();
task.ContinueWith(
x =>
{
if (x.IsFaulted)
{
taskCompletionSource.TrySetException(x.Exception.GetBaseException());
}
else if (x.IsCanceled)
{
taskCompletionSource.TrySetCanceled();
}
else
{
taskCompletionSource.TrySetResult(true);
}
});
return taskCompletionSource.Task;
}
And here it is after making this change:
public static Task<bool> ToBooleanTask(this Task task)
{
var taskCompletionSource = new TaskCompletionSource<bool>();
task.ContinueWith(
x =>
{
if (x.IsFaulted)
{
taskCompletionSource.TrySetException(new InvalidOperationException("Failure in to boolean task", x.Exception.GetBaseException()));
}
else if (x.IsCanceled)
{
taskCompletionSource.TrySetCanceled();
}
else
{
taskCompletionSource.TrySetResult(true);
}
});
return taskCompletionSource.Task;
}
In this case I already have a TaskCompletionSource
, so it was straightforward. In other cases, I had to explicitly create a TaskCompletionSource
and forward any fault/cancel/result from the underlying Task
onto the TaskCompletionSource
.
Aside: you may wonder at the use of the ToBooleanTask
extension method. It's incredibly useful where you want to implement a single method that handles both generic and non-generic tasks. You can implement the generic version and then have your non-generic overload call ToBooleanTask
in order to create a generic task, which can then be passed onto your generic overload.
Once I had gone through all the possible culprits and supplemented them as per above, I re-ran my test until it failed and noticed that it was, indeed ToBooleanTask
that was creating the task that was not being observed. Therefore, I modified it to:
public static Task<bool> ToBooleanTask(this Task task)
{
var stackTrace = new System.Diagnostics.StackTrace(true);
var taskCompletionSource = new TaskCompletionSource<bool>();
task.ContinueWith(
x =>
{
if (x.IsFaulted)
{
taskCompletionSource.TrySetException(new InvalidOperationException("Failure in to boolean task with stack trace: " + stackTrace, x.Exception.GetBaseException()));
}
else if (x.IsCanceled)
{
taskCompletionSource.TrySetCanceled();
}
else
{
taskCompletionSource.TrySetResult(true);
}
});
return taskCompletionSource.Task;
}
This would give me a full stack trace when the failure occurred. I re-ran my test until it failed, and - hooray! - got the information I needed to track down the issue:
Failure in to boolean task with stack trace: at XXX.Utility.Tasks.TaskExtensions.ToBooleanTask(Task task) in C:\XXX\Src\Utility\Tasks\TaskExtensions.cs:line 110
at XXX.Utility.Tasks.TaskExtensions.Then(Task antecedent, Func`2 getSuccessor, CancellationToken cancellationToken, TaskThenOptions options) in C:\XXX\Src\Utility\Tasks\TaskExtensions.cs:line 199
at XXX.Utility.Tasks.StateMachineTaskFactory`1.TransitionTo(T endTransitionState, CancellationToken cancellationToken, WaitForTransitionCallback`1 waitForTransitionCallback, ValidateTransitionCallback`1 validateTransitionCallback, PreTransitionCallback`1 preTransitionCallback, Object state) in C:\XXX\Src\Utility\Tasks\StateMachineTaskFactory.cs:line 312
<snip>
So I could see it was one of my Then
overloads calling ToBooleanTask
. I could then trace through that exact code and the problem became evident very quickly.
This got me curious, though. Why hadn't my original approach of supplementing each task with a name produced any results? I tried reverting my fix, directly naming the task produced by ToBooleanTask
, and re-running until I got a failure. Sure enough, I saw the task name in my debugger. So clearly I had somehow missed naming this task originally.
Phew!