19

Trace.CorrelationManager.LogicalOperationStack enables having nested logical operation identifiers where the most common case is logging (NDC). Should it still work with async-await?

Here's a simple example using LogicalFlow which is my simple wrapper over the LogicalOperationStack:

private static void Main() => OuterOperationAsync().GetAwaiter().GetResult();

private static async Task OuterOperationAsync()
{
    Console.WriteLine(LogicalFlow.CurrentOperationId);
    using (LogicalFlow.StartScope())
    {
        Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
        await InnerOperationAsync();
        Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
        await InnerOperationAsync();
        Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
    }
    Console.WriteLine(LogicalFlow.CurrentOperationId);
}

private static async Task InnerOperationAsync()
{
    using (LogicalFlow.StartScope())
    {
        await Task.Delay(100);
    }
}

LogicalFlow:

public static class LogicalFlow
{
    public static Guid CurrentOperationId =>
        Trace.CorrelationManager.LogicalOperationStack.Count > 0
            ? (Guid) Trace.CorrelationManager.LogicalOperationStack.Peek()
            : Guid.Empty;

    public static IDisposable StartScope()
    {
        Trace.CorrelationManager.StartLogicalOperation();
        return new Stopper();
    }

    private static void StopScope() => 
        Trace.CorrelationManager.StopLogicalOperation();

    private class Stopper : IDisposable
    {
        private bool _isDisposed;
        public void Dispose()
        {
            if (!_isDisposed)
            {
                StopScope();
                _isDisposed = true;
            }
        }
    }
}

Output:

00000000-0000-0000-0000-000000000000
    49985135-1e39-404c-834a-9f12026d9b65
    54674452-e1c5-4b1b-91ed-6bd6ea725b98
    c6ec00fd-bff8-4bde-bf70-e073b6714ae5
54674452-e1c5-4b1b-91ed-6bd6ea725b98

The specific values don't really matter, but as I understand it both the outer lines should show Guid.Empty (i.e. 00000000-0000-0000-0000-000000000000) and the inner lines should show the same Guid value.

You might say that LogicalOperationStack is using a Stack which is not thread-safe and that's why the output is wrong. But while that's true in general, in this case there's never more than a single thread accessing the LogicalOperationStack at the same time (every async operation is awaited when called and no use of combinators such as Task.WhenAll)

The issue is that LogicalOperationStack is stored in the CallContext which has a copy-on-write behavior. That means that as long as you don't explicitly set something in the CallContext (and you don't when you add to an existing stack with StartLogicalOperation) you're using the parent context and not your own.

This can be shown by simply setting anything into the CallContext before adding to the existing stack. For example if we changed StartScope to this:

public static IDisposable StartScope()
{
    CallContext.LogicalSetData("Bar", "Arnon");
    Trace.CorrelationManager.StartLogicalOperation();
    return new Stopper();
}

The output is:

00000000-0000-0000-0000-000000000000
    fdc22318-53ef-4ae5-83ff-6c3e3864e37a
    fdc22318-53ef-4ae5-83ff-6c3e3864e37a
    fdc22318-53ef-4ae5-83ff-6c3e3864e37a
00000000-0000-0000-0000-000000000000

Note: I'm not suggesting anyone actually do this. The real practical solution would be to use an ImmutableStack instead of the LogicalOperationStack as it's both thread-safe and since it's immutable when you call Pop you get back a new ImmutableStack that you then need to set back into the CallContext. A full implementation is available as an answer to this question: Tracking c#/.NET tasks flow

So, should LogicalOperationStack work with async and it's just a bug? Is LogicalOperationStack just not meant for the async world? Or am I missing something?


Update: Using Task.Delay is apparently confusing as it uses System.Threading.Timer which captures the ExecutionContext internally. Using await Task.Yield(); instead of await Task.Delay(100); makes the example easier to understand.

Community
  • 1
  • 1
i3arnon
  • 113,022
  • 33
  • 324
  • 344
  • 1
    Possibly a duplicate: [Is CorrelationManager.LogicalOperationStack compatible with Parallel.For, Tasks, Threads, etc](http://stackoverflow.com/questions/4729479/is-correlationmanager-logicaloperationstack-compatible-with-parallel-for-tasks) – noseratio Feb 27 '15 at 20:05
  • @Noseratio nope. That's not about `async-await` but about a bug that was already fixed. – i3arnon Feb 27 '15 at 20:12
  • Agreed, not a dupe. Did you check if it works as expected on the same thread, e.g. if you replace `await Task.Delay(100)` with `Task.Delay(100).Wait()`? – noseratio Feb 27 '15 at 21:00
  • @Noseratio yes. It works of course, because there's only a single thread (and so a single `CallContext`). It's as if the method wasn't `async` to begin with. – i3arnon Feb 27 '15 at 21:01
  • Looks like a bug to me. [Here](http://referencesource.microsoft.com/#mscorlib/system/runtime/remoting/callcontext.cs,f0df09325f655c90,references) it gets merged only if it has new user data (`lc.HasUserData`) on `LogicalCallContext` itself (as you code shows), but not on the `s_CorrelationMgrSlotName` child slot, which is itself a dictiionary. – noseratio Feb 27 '15 at 21:28
  • @Noseratio I think that's irrelevant. AFAIK `LogicalCallContext` is used to transfer `CallContext` items to different `AppDomain`s. `Merge` is only used by `RealProxy` which is the class behind remoting and WCF. – i3arnon Feb 27 '15 at 21:40
  • Note sure about `Merge` itself, but `LogicalCallContext` is used inside `ExecutionContext` to flow data set with `CallContext.LogicalSetData`, [here](http://referencesource.microsoft.com/#mscorlib/system/runtime/remoting/callcontext.cs,946b188eee23f5b9,references). That's how `LogicalSetData/LogicalGetData` gets flowed across async calls within the same appdomain, too. – noseratio Feb 27 '15 at 21:51
  • Overall, I think the design of `ExecutionContext` flow (including its one-level-only copy-on-write logic) is a bit twisted. E.g., they don't have a public API to suppress `SynchronizationContext` flow when you use `ExecutionContext.Capture`/`Run`, which is a problem when implementing `ICriticalNotifyCompletion`. – noseratio Feb 27 '15 at 21:52
  • 1
    @downvoter something wrong? – i3arnon Apr 30 '15 at 21:18
  • 1
    It's a great technical question. Perhaps, the downvoter simply doesn't understand what it's all about. – noseratio Apr 30 '15 at 22:48
  • Related blog post by Stephen Cleary, explaining the issue with mutable data in the logical call context: https://blog.stephencleary.com/2013/04/implicit-async-context-asynclocal.html – Ohad Schneider Aug 12 '17 at 23:03

3 Answers3

13

Yes, LogicalOperationStack should work with async-await and it is a bug that it doesn't.

I've contacted the relevant developer at Microsoft and his response was this:

"I wasn't aware of this, but it does seem broken. The copy-on-write logic is supposed to behave exactly as if we'd really created a copy of the ExecutionContext on entry into the method. However, copying the ExecutionContext would have created a deep copy of the CorrelationManager context, as it's special-cased in CallContext.Clone(). We don't take that into account in the copy-on-write logic."

Moreover, he recommended using the new System.Threading.AsyncLocal<T> class added in .Net 4.6 instead which should handle that issue correctly.

So, I went ahead and implemented LogicalFlow on top of an AsyncLocal instead of the LogicalOperationStack using VS2015 RC and .Net 4.6:

public static class LogicalFlow
{
    private static AsyncLocal<Stack> _asyncLogicalOperationStack = new AsyncLocal<Stack>();

    private static Stack AsyncLogicalOperationStack
    {
        get
        {
            if (_asyncLogicalOperationStack.Value == null)
            {
                _asyncLogicalOperationStack.Value = new Stack();
            }

            return _asyncLogicalOperationStack.Value;
        }
    }

    public static Guid CurrentOperationId =>
        AsyncLogicalOperationStack.Count > 0
            ? (Guid)AsyncLogicalOperationStack.Peek()
            : Guid.Empty;

    public static IDisposable StartScope()
    {
        AsyncLogicalOperationStack.Push(Guid.NewGuid());
        return new Stopper();
    }

    private static void StopScope() =>
        AsyncLogicalOperationStack.Pop();
}

And the output for the same test is indeed as it should be:

00000000-0000-0000-0000-000000000000
    ae90c3e3-c801-4bc8-bc34-9bccfc2b692a
    ae90c3e3-c801-4bc8-bc34-9bccfc2b692a
    ae90c3e3-c801-4bc8-bc34-9bccfc2b692a
00000000-0000-0000-0000-000000000000
i3arnon
  • 113,022
  • 33
  • 324
  • 344
  • 2
    I'm curious, how did you contact the relevant developer at Microsoft? [The bug at Connect](http://connect.microsoft.com/VisualStudio/feedback/details/1114703/logicaloperationstack-doesnt-work-with-async) doesn't have any feedback from Microsoft yet. – drowa May 14 '15 at 17:31
  • Is there any way AsyncLocal behavior can be achieved in .net 4.5? – Filip Stas May 25 '16 at 09:56
  • @FilipStas maybe, but I'm not sure how. – i3arnon May 25 '16 at 10:38
  • @i3arnon I think there may be a problem with this code, say you created scope on a main thread and stopped it, then you try to create a new scope from the inside of a method running in QueueUserWorkItem, AsyncLogicalOperationStack getter will return _asyncLogicalOperationStack.Value since it was initialized previously, but .Value will be null. I can repro this behavior in windows service trying to start scope in OnStart after creating/stopping at least 1 scope in constructor – illegal-immigrant Jun 23 '16 at 19:00
  • 1
    @taras.roshko true, because I created the AsyncLocal once and set it's value once when I need to set it for each async flow separately. I updated the answer accordingly – i3arnon Jun 24 '16 at 12:52
  • 1
    @i3arnon Will your `LogicalFlow` class behave properly in a scenario when several parallel tasks are started in a child operation and are creating their own child operations? In other words, is this class thread safe? – Gebb May 02 '17 at 11:19
  • 1
    I don't believe this solution is good enough. Like you said in a comment below, deep cloning is required otherwise several threads end up with the same (non-thread-safe stack). If I do something like `using (LogicalFlow.StartScope()) { Task.WaitAll(Task.Run(InnerOperationAsync), Task.Run(InnerOperationAsync)) }`, all 3 threads (parent and two children) will have the same `Stack` instance (as I verified with `Object.ReferenceEquals`) . What you did here however should work fine as it utilizes deep copying: https://stackoverflow.com/a/24468972/67824. – Ohad Schneider Aug 13 '17 at 10:54
  • I have to deplore the lack of proactivity from Microsoft: they answered privately to a person who raised the bug, but they didn't update their documentation! – Johan Boulé Aug 10 '18 at 15:11
6

If you're still interested in this, I believe it's a bug in how they flow LogicalOperationStack and I think it's a good idea to report it.

They give special treatment to LogicalOperationStack's stack here in LogicalCallContext.Clone, by doing a deep copy (unlike with other data stored via CallContext.LogicalSetData/LogicalGetData, on which only a shallow copy is performed).

This LogicalCallContext.Clone is called every time ExecutionContext.CreateCopy or ExecutionContext.CreateMutableCopy is called to flow the ExecutionContext.

Based on your code, I did a little experiment by providing my own mutable stack for "System.Diagnostics.Trace.CorrelationManagerSlot" slot in LogicalCallContext, to see when and how many times it actually gets cloned.

The code:

using System;
using System.Collections;
using System.Diagnostics;
using System.Linq;
using System.Runtime.Remoting.Messaging;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApplication
{
    class Program
    {
        static readonly string CorrelationManagerSlot = "System.Diagnostics.Trace.CorrelationManagerSlot";

        public static void ShowCorrelationManagerStack(object where)
        {
            object top = "null";
            var stack = (MyStack)CallContext.LogicalGetData(CorrelationManagerSlot);
            if (stack.Count > 0)
                top = stack.Peek();

            Console.WriteLine("{0}: MyStack Id={1}, Count={2}, on thread {3}, top: {4}",
                where, stack.Id, stack.Count, Environment.CurrentManagedThreadId, top);
        }

        private static void Main()
        {
            CallContext.LogicalSetData(CorrelationManagerSlot, new MyStack());

            OuterOperationAsync().Wait();
            Console.ReadLine();
        }

        private static async Task OuterOperationAsync()
        {
            ShowCorrelationManagerStack(1.1);

            using (LogicalFlow.StartScope())
            {
                ShowCorrelationManagerStack(1.2);
                Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
                await InnerOperationAsync();
                ShowCorrelationManagerStack(1.3);
                Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
                await InnerOperationAsync();
                ShowCorrelationManagerStack(1.4);
                Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
            }

            ShowCorrelationManagerStack(1.5);
        }

        private static async Task InnerOperationAsync()
        {
            ShowCorrelationManagerStack(2.1);
            using (LogicalFlow.StartScope())
            {
                ShowCorrelationManagerStack(2.2);
                await Task.Delay(100);
                ShowCorrelationManagerStack(2.3);
            }
            ShowCorrelationManagerStack(2.4);
        }
    }

    public class MyStack : Stack, ICloneable
    {
        public static int s_Id = 0;

        public int Id { get; private set; }

        object ICloneable.Clone()
        {
            var cloneId = Interlocked.Increment(ref s_Id); ;
            Console.WriteLine("Cloning MyStack Id={0} into {1} on thread {2}", this.Id, cloneId, Environment.CurrentManagedThreadId);

            var clone = new MyStack();
            clone.Id = cloneId;

            foreach (var item in this.ToArray().Reverse())
                clone.Push(item);

            return clone;
        }
    }

    public static class LogicalFlow
    {
        public static Guid CurrentOperationId
        {
            get
            {
                return Trace.CorrelationManager.LogicalOperationStack.Count > 0
                    ? (Guid)Trace.CorrelationManager.LogicalOperationStack.Peek()
                    : Guid.Empty;
            }
        }

        public static IDisposable StartScope()
        {
            Program.ShowCorrelationManagerStack("Before StartLogicalOperation");
            Trace.CorrelationManager.StartLogicalOperation();
            Program.ShowCorrelationManagerStack("After StartLogicalOperation");
            return new Stopper();
        }

        private static void StopScope()
        {
            Program.ShowCorrelationManagerStack("Before StopLogicalOperation");
            Trace.CorrelationManager.StopLogicalOperation();
            Program.ShowCorrelationManagerStack("After StopLogicalOperation");
        }

        private class Stopper : IDisposable
        {
            private bool _isDisposed;
            public void Dispose()
            {
                if (!_isDisposed)
                {
                    StopScope();
                    _isDisposed = true;
                }
            }
        }
    }
}

The result is quite surprising. Even though there're only two threads involved in this async workflow, the stack gets cloned as many as 4 times. And the problem is, the matching Stack.Push and Stack.Pop operations (called by StartLogicalOperation/StopLogicalOperation) operate on the different, non-matching clones of the stack, thus disbalancing the "logical" stack. That's where the bug lays in.

This indeed makes LogicalOperationStack totally unusable across async calls, even though there's no concurrent forks of tasks.

Updated, I also did a little research about how it may behave for synchronous calls, to address these comments:

Agreed, not a dupe. Did you check if it works as expected on the same thread, e.g. if you replace await Task.Delay(100) with Task.Delay(100).Wait()? – Noseratio Feb 27 at 21:00

@Noseratio yes. It works of course, because there's only a single thread (and so a single CallContext). It's as if the method wasn't async to begin with. – i3arnon Feb 27 at 21:01

Single thread doesn't mean single CallContext. Even for synchronous continuations on the same single thread the execution context (and its inner LogicalCallContext) can get cloned. Example, using the above code:

private static void Main()
{
    CallContext.LogicalSetData(CorrelationManagerSlot, new MyStack());

    ShowCorrelationManagerStack(0.1);

    CallContext.LogicalSetData("slot1", "value1");
    Console.WriteLine(CallContext.LogicalGetData("slot1"));

    Task.FromResult(0).ContinueWith(t =>
        {
            ShowCorrelationManagerStack(0.2);

            CallContext.LogicalSetData("slot1", "value2");
            Console.WriteLine(CallContext.LogicalGetData("slot1"));
        }, 
        CancellationToken.None,
        TaskContinuationOptions.ExecuteSynchronously,
        TaskScheduler.Default);

    ShowCorrelationManagerStack(0.3);
    Console.WriteLine(CallContext.LogicalGetData("slot1"));

    // ...
}

Output (note how we lose "value2"):

0.1: MyStack Id=0, Count=0, on thread 9, top:
value1
Cloning MyStack Id=0 into 1 on thread 9
0.2: MyStack Id=1, Count=0, on thread 9, top:
value2
0.3: MyStack Id=0, Count=0, on thread 9, top:
value1
Community
  • 1
  • 1
noseratio
  • 59,932
  • 34
  • 208
  • 486
  • `Task.Delay` captures the `ExecutionContext` internally so it's not a surprise you have 4 clones. You can use `Task.Yield` to make it simpler to understand. With `Task.Yield` you only get the expected 2 clones. The issue, as I described, is that because of the "copy-on-write" behavior you use the parent context when you shouldn't. In your example it manifests as the 2 clones are cloned from the same original stack. – i3arnon Mar 08 '15 at 18:01
  • About your update: I didn't state that the stack won't get cloned with `Task.Delay(100).Wait()` (actually, it does get cloned in this case, it doesn't with `Task.Yield.GetAwaiter().GetResult()` though). I've stated that it works as expected as the cloned stack isn't used by the single thread that's running the code (just like your `"value2"`) – i3arnon Mar 08 '15 at 18:01
  • Also, your `MyStack.Clone` doesn't clone the stack correctly as it reverses it each time. – i3arnon Mar 08 '15 at 18:01
  • @i3arnon, there's indeed a bug which can be fixed as `foreach (var item in this.ToArray().Reverse())`. I don't think that changes the main idea that `Push`/`Pop` operates on different clones of the stack here. I'll verify that later. – noseratio Mar 08 '15 at 23:10
  • Of course they do, that's the premise of the question. Both methods `Push` to the same stack, then it's cloned twice (once per `await`). The first `Pop` happens on the first clone (which has 2 items) in the inner method and the second one happens on the second clone which also has 2 items (although it should have only one). That happens because the inner method added to the stack of the outer one as there was no copy because of copy-on-write. – i3arnon Mar 08 '15 at 23:27
  • @i3arnon, I'd say, in the 1st place it's happening because they treat it specially - with deep cloning - for `"System.Diagnostics.Trace.CorrelationManagerSlot"` data slot in `LogicalCallContext.Clone`, unlike with other slots. I imagine they do it to support task forking and multithreading, but it doesn't work anyway. – noseratio Mar 08 '15 at 23:44
  • No, without deep-cloning only the stack's reference will be cloned and that will lead to multiple threads operating on the same non-thread-safe `Stack` instance. Even when there's only a single thread it still wouldn't work for fork/join scenarios. Deep-cloning is better in this case than shallow-cloning and copy-on-push is better than copy-on-write. – i3arnon Mar 09 '15 at 10:13
0

One of the solutions mentioned here and on the web is to call LogicalSetData on context:

CallContext.LogicalSetData("one", null);
Trace.CorrelationManager.StartLogicalOperation();

But in fact, just reading current execution context is enough:

var context = Thread.CurrentThread.ExecutionContext;
Trace.CorrelationManager.StartLogicalOperation();
xll
  • 3,019
  • 2
  • 16
  • 19