28

I'm dealing with a situation where a managed object gets prematurely finalized in the middle of async method.

This is a hobby home automation project (Windows 8.1, .NET 4.5.1), where I supply a C# callback to an unmanaged 3rd party DLL. The callback gets invoked upon a certain sensor event.

To handle the event, I use async/await and a simple custom awaiter (rather than TaskCompletionSource). I do it this way partly to reduce the number of unnecessary allocations, but mostly out of curiosity as a learning exercise.

Below is a very stripped version of what I have, using a Win32 timer-queue timer to simulate the unmanaged event source. Let's start with the output:

Press Enter to exit...
Awaiter()
tick: 0
tick: 1
~Awaiter()
tick: 2
tick: 3
tick: 4

Note how my awaiter gets finalized after the second tick. This is unexpected.

The code (a console app):

using System;
using System.Runtime.InteropServices;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApplication
{
    class Program
    {
        static async Task TestAsync()
        {
            var awaiter = new Awaiter();
            //var hold = GCHandle.Alloc(awaiter);

            WaitOrTimerCallbackProc callback = (a, b) =>
                awaiter.Continue();

            IntPtr timerHandle;
            if (!CreateTimerQueueTimer(out timerHandle, 
                    IntPtr.Zero, 
                    callback, 
                    IntPtr.Zero, 500, 500, 0))
                throw new System.ComponentModel.Win32Exception(
                    Marshal.GetLastWin32Error());

            var i = 0;
            while (true)
            {
                await awaiter;
                Console.WriteLine("tick: " + i++);
            }
        }

        static void Main(string[] args)
        {
            Console.WriteLine("Press Enter to exit...");
            var task = TestAsync();
            Thread.Sleep(1000);
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            Console.ReadLine();
        }

        // custom awaiter
        public class Awaiter : 
            System.Runtime.CompilerServices.INotifyCompletion
        {
            Action _continuation;

            public Awaiter()
            {
                Console.WriteLine("Awaiter()");
            }

            ~Awaiter()
            {
                Console.WriteLine("~Awaiter()");
            }

            // resume after await, called upon external event
            public void Continue()
            {
                var continuation = Interlocked.Exchange(ref _continuation, null);
                if (continuation != null)
                    continuation();
            }

            // custom Awaiter methods
            public Awaiter GetAwaiter()
            {
                return this;
            }

            public bool IsCompleted
            {
                get { return false; }
            }

            public void GetResult()
            {
            }

            // INotifyCompletion
            public void OnCompleted(Action continuation)
            {
                Volatile.Write(ref _continuation, continuation);
            }
        }

        // p/invoke
        delegate void WaitOrTimerCallbackProc(IntPtr lpParameter, bool TimerOrWaitFired);

        [DllImport("kernel32.dll")]
        static extern bool CreateTimerQueueTimer(out IntPtr phNewTimer,
           IntPtr TimerQueue, WaitOrTimerCallbackProc Callback, IntPtr Parameter,
           uint DueTime, uint Period, uint Flags);
    }
}

I managed to suppress the collection of awaiter with this line:

var hold = GCHandle.Alloc(awaiter);

However I don't fully understand why I have to create a strong reference like this. The awaiter is referenced inside an endless loop. AFAICT, it is not going out of scope until the task returned by TestAsync becomes completed (cancelled/faulted). And the task itself is referenced inside Main forever.

Eventually, I reduced TestAsync to just this:

static async Task TestAsync()
{
    var awaiter = new Awaiter();
    //var hold = GCHandle.Alloc(awaiter);

    var i = 0;
    while (true)
    {
        await awaiter;
        Console.WriteLine("tick: " + i++);
    }
}

The collection still takes place. I suspect the whole compiler-generated state machine object is getting collected. Can someone please explain why this is happening?

Now, with the following minor modification, the awaiter no longer gets garbage-collected:

static async Task TestAsync()
{
    var awaiter = new Awaiter();
    //var hold = GCHandle.Alloc(awaiter);

    var i = 0;
    while (true)
    {
        //await awaiter;
        await Task.Delay(500);
        Console.WriteLine("tick: " + i++);
    }
}

Updated, this fiddle shows how the awaiter object gets garbage-collected without any p/invoke code. I think, the reason might be that there is no external references to awaiter outside the initial state of the generated state machine object. I need to study the compiler-generated code.


Updated, here's the compiler-generated code (for this fiddle, VS2012). Apparently, the Task returned by stateMachine.t__builder.Task doesn't keep a reference to (or rather, a copy of) the state machine itself (stateMachine). Am I missing something?

    private static Task TestAsync()
    {
      Program.TestAsyncd__0 stateMachine;
      stateMachine.t__builder = AsyncTaskMethodBuilder.Create();
      stateMachine.1__state = -1;
      stateMachine.t__builder.Start<Program.TestAsyncd__0>(ref stateMachine);
      return stateMachine.t__builder.Task;
    }

    [CompilerGenerated]
    [StructLayout(LayoutKind.Auto)]
    private struct TestAsyncd__0 : IAsyncStateMachine
    {
      public int 1__state;
      public AsyncTaskMethodBuilder t__builder;
      public Program.Awaiter awaiter5__1;
      public int i5__2;
      private object u__awaiter3;
      private object t__stack;

      void IAsyncStateMachine.MoveNext()
      {
        try
        {
          bool flag = true;
          Program.Awaiter awaiter;
          switch (this.1__state)
          {
            case -3:
              goto label_7;
            case 0:
              awaiter = (Program.Awaiter) this.u__awaiter3;
              this.u__awaiter3 = (object) null;
              this.1__state = -1;
              break;
            default:
              this.awaiter5__1 = new Program.Awaiter();
              this.i5__2 = 0;
              goto label_5;
          }
label_4:
          awaiter.GetResult();
          Console.WriteLine("tick: " + (object) this.i5__2++);
label_5:
          awaiter = this.awaiter5__1.GetAwaiter();
          if (!awaiter.IsCompleted)
          {
            this.1__state = 0;
            this.u__awaiter3 = (object) awaiter;
            this.t__builder.AwaitOnCompleted<Program.Awaiter, Program.TestAsyncd__0>(ref awaiter, ref this);
            flag = false;
            return;
          }
          else
            goto label_4;
        }
        catch (Exception ex)
        {
          this.1__state = -2;
          this.t__builder.SetException(ex);
          return;
        }
label_7:
        this.1__state = -2;
        this.t__builder.SetResult();
      }

      [DebuggerHidden]
      void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine param0)
      {
        this.t__builder.SetStateMachine(param0);
      }
    }
noseratio
  • 59,932
  • 34
  • 208
  • 486
  • Have you tried to pin the object? – aybe Mar 08 '14 at 11:20
  • I mean to use the overload where you explicitly specify it : http://msdn.microsoft.com/fr-fr/library/1246yz8f(v=vs.110).aspx The one you use does Normal, not Pinned. For the cause sorry but I can't really tell. – aybe Mar 08 '14 at 11:27
  • My answer below seemed like a solid theory.. but my experiments are really starting to throw me off now. I gave the `Awaiter` a `Guid` property and I am printing it to the console in every call to `Continue`... it fires perfectly.. even after the finalizer has supposedly fired. I really am at a loss... that just doesn't seem possible.. unless the entire thing has been inlined by the JIT compiler. – Simon Whitehead Mar 08 '14 at 14:12
  • Not that I have any idea how to use it, but solving this will likely require you to break out [windbg and sos](http://msdn.microsoft.com/en-us/library/bb190764%28v=vs.110%29.aspx). Using `sos` is a good skill to learn (I have only dipped my toes in to it to track down a OOM issue once) and it may help you figure this out. (BTW, thanks again for another great question you will have more rep than me by the end of the year I bet ;) ) – Scott Chamberlain Mar 08 '14 at 16:36
  • 1
    GCHandle.Alloc() is required here. But not on the awaiter, you *must* keep the *callback* delegate object alive. Which in turn ensures that the awaiter stays alive as well. Hard requirement, and a very common need for callbacks from unmanaged code, the GC has no idea that the native code has an implicit reference on it. – Hans Passant Mar 08 '14 at 18:14
  • @SimonWhitehead, you had some interesting thoughts in your answer, you shouldn't have deleted it. It does make sense, same as Hans' answer, but it doesn't explain while the same thing happens *without* any p/invoke stuff (the 2nd code fragment). Fiddle: http://dotnetfiddle.net/pWaoN1. – noseratio Mar 08 '14 at 21:44
  • @ScottChamberlain, thanks for the encouraging words :) SOS looks pretty interesting, I've never looked at it. – noseratio Mar 08 '14 at 21:50
  • 1
    FYI, I have submitted this question to The Bug Guys. Hopefully Eric Lippert might have some interesting input into this question! – Simon Whitehead Mar 09 '14 at 00:37

1 Answers1

19

I've removed all p/invoke stuff and re-created a simplified version of the compiler-generated state machine logic. It exhibits the same behavior: the awaiter gets garabage-collected after the first invocation of the state machine's MoveNext method.

Microsoft has recently done an excellent job on providing the Web UI to their .NET reference sources, that's been very helpful. After studying the implementation of AsyncTaskMethodBuilder and, most importantly, AsyncMethodBuilderCore.GetCompletionAction, I now believe the GC behavior I'm seeing makes perfect sense. I'll try to explain that below.

The code:

using System;
using System.Threading;
using System.Threading.Tasks;
using System.Runtime.InteropServices;
using System.Runtime.CompilerServices;

namespace ConsoleApplication
{
    public class Program
    {
        // Original version with async/await

        /*
        static async Task TestAsync()
        {
            Console.WriteLine("Enter TestAsync");
            var awaiter = new Awaiter();
            //var hold = GCHandle.Alloc(awaiter);

            var i = 0;
            while (true)
            {
                await awaiter;
                Console.WriteLine("tick: " + i++);
            }
            Console.WriteLine("Exit TestAsync");
        }
        */

        // Manually coded state machine version

        struct StateMachine: IAsyncStateMachine
        {
            public int _state;
            public Awaiter _awaiter;
            public AsyncTaskMethodBuilder _builder;

            public void MoveNext()
            {
                Console.WriteLine("StateMachine.MoveNext, state: " + this._state);
                switch (this._state)
                {
                    case -1:
                        {
                            this._awaiter = new Awaiter();
                            goto case 0;
                        };
                    case 0:
                        {
                            this._state = 0;
                            var awaiter = this._awaiter;
                            this._builder.AwaitOnCompleted(ref awaiter, ref this);
                            return;
                        };

                    default:
                        throw new InvalidOperationException();
                }
            }

            public void SetStateMachine(IAsyncStateMachine stateMachine)
            {
                Console.WriteLine("StateMachine.SetStateMachine, state: " + this._state);
                this._builder.SetStateMachine(stateMachine);
                // s_strongRef = stateMachine;
            }

            static object s_strongRef = null;
        }

        static Task TestAsync()
        {
            StateMachine stateMachine = new StateMachine();
            stateMachine._state = -1;

            stateMachine._builder = AsyncTaskMethodBuilder.Create();
            stateMachine._builder.Start(ref stateMachine);

            return stateMachine._builder.Task;
        }

        public static void Main(string[] args)
        {
            var task = TestAsync();
            Thread.Sleep(1000);
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            Console.WriteLine("Press Enter to exit...");
            Console.ReadLine();
        }

        // custom awaiter
        public class Awaiter :
            System.Runtime.CompilerServices.INotifyCompletion
        {
            Action _continuation;

            public Awaiter()
            {
                Console.WriteLine("Awaiter()");
            }

            ~Awaiter()
            {
                Console.WriteLine("~Awaiter()");
            }

            // resume after await, called upon external event
            public void Continue()
            {
                var continuation = Interlocked.Exchange(ref _continuation, null);
                if (continuation != null)
                    continuation();
            }

            // custom Awaiter methods
            public Awaiter GetAwaiter()
            {
                return this;
            }

            public bool IsCompleted
            {
                get { return false; }
            }

            public void GetResult()
            {
            }

            // INotifyCompletion
            public void OnCompleted(Action continuation)
            {
                Console.WriteLine("Awaiter.OnCompleted");
                Volatile.Write(ref _continuation, continuation);
            }
        }
    }
}

The compiler-generated state machine is a mutable struct, being passed over by ref. Apparently, this is an optimization to avoid extra allocations.

The core part of this is taking place inside AsyncMethodBuilderCore.GetCompletionAction, where the current state machine struct gets boxed, and the reference to the boxed copy is kept by the continuation callback passed to INotifyCompletion.OnCompleted.

This is the only reference to the state machine which has a chance to stand the GC and survive after await. The Task object returned by TestAsync does not hold a reference to it, only the await continuation callback does. I believe this is done on purpose, to preserve the efficient GC behavior.

Note the commented line:

// s_strongRef = stateMachine;

If I un-comment it, the boxed copy of the state machine doesn't get GC'ed, and awaiter stays alive as a part of it. Of course, this is not a solution, but it illustrates the problem.

So, I've come to the following conclusion. While an async operation is in "in-flight" and none of the state machine's states (MoveNext) is currently being executed, it's the responsibility of the "keeper" of the continuation callback to put a strong hold on the callback itself, to make sure the boxed copy of the state machine does not get garbage-collected.

For example, in case with YieldAwaitable (returned by Task.Yield), the external reference to the continuation callback is kept by the ThreadPool task scheduler, as a result of ThreadPool.QueueUserWorkItem call. In case with Task.GetAwaiter, it is indirectly referenced by the task object.

In my case, the "keeper" of the continuation callback is the Awaiter itself.

Thus, as long as there is no external references to the continuation callback the CLR is aware of (outside the state machine object), the custom awaiter should take steps to keep the callback object alive. This, in turn, would keep alive the whole state machine. The following steps would be necessary in this case:

  1. Call the GCHandle.Alloc on the callback upon INotifyCompletion.OnCompleted.
  2. Call GCHandle.Free when the async event has actually happened, before invoking the continuation callback.
  3. Implement IDispose to call GCHandle.Free if the event has never happened.

Given that, below is a version of the original timer callback code, which works correctly. Note, there is no need to put a strong hold on the timer callback delegate (WaitOrTimerCallbackProc callback). It is kept alive as a part of the state machine. Updated: as pointed out by @svick, this statement may be specific to the current implementation of the state machine (C# 5.0). I've added GC.KeepAlive(callback) to eliminate any dependency on this behavior, in case it changes in the future compiler versions.

using System;
using System.Runtime.InteropServices;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApplication
{
    class Program
    {
        // Test task
        static async Task TestAsync(CancellationToken token)
        {
            using (var awaiter = new Awaiter())
            {
                WaitOrTimerCallbackProc callback = (a, b) =>
                    awaiter.Continue();
                try
                {
                    IntPtr timerHandle;
                    if (!CreateTimerQueueTimer(out timerHandle,
                            IntPtr.Zero,
                            callback,
                            IntPtr.Zero, 500, 500, 0))
                        throw new System.ComponentModel.Win32Exception(
                            Marshal.GetLastWin32Error());
                    try
                    {
                        var i = 0;
                        while (true)
                        {
                            token.ThrowIfCancellationRequested();
                            await awaiter;
                            Console.WriteLine("tick: " + i++);
                        }
                    }
                    finally
                    {
                        DeleteTimerQueueTimer(IntPtr.Zero, timerHandle, IntPtr.Zero);
                    }
                }
                finally
                {
                    // reference the callback at the end
                    // to avoid a chance for it to be GC'ed
                    GC.KeepAlive(callback);
                }
            }
        }

        // Entry point
        static void Main(string[] args)
        {
            // cancel in 3s
            var testTask = TestAsync(new CancellationTokenSource(10 * 1000).Token);

            Thread.Sleep(1000);
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced, true);

            Thread.Sleep(2000);
            Console.WriteLine("Press Enter to GC...");
            Console.ReadLine();

            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            Console.WriteLine("Press Enter to exit...");
            Console.ReadLine();
        }

        // Custom awaiter
        public class Awaiter :
            System.Runtime.CompilerServices.INotifyCompletion,
            IDisposable
        {
            Action _continuation;
            GCHandle _hold = new GCHandle();

            public Awaiter()
            {
                Console.WriteLine("Awaiter()");
            }

            ~Awaiter()
            {
                Console.WriteLine("~Awaiter()");
            }

            void ReleaseHold()
            {
                if (_hold.IsAllocated)
                    _hold.Free();
            }

            // resume after await, called upon external event
            public void Continue()
            {
                Action continuation;

                // it's OK to use lock (this)
                // the C# compiler would never do this,
                // because it's slated to work with struct awaiters
                lock (this)
                {
                    continuation = _continuation;
                    _continuation = null;
                    ReleaseHold();
                }

                if (continuation != null)
                    continuation();
            }

            // custom Awaiter methods
            public Awaiter GetAwaiter()
            {
                return this;
            }

            public bool IsCompleted
            {
                get { return false; }
            }

            public void GetResult()
            {
            }

            // INotifyCompletion
            public void OnCompleted(Action continuation)
            {
                lock (this)
                {
                    ReleaseHold();
                    _continuation = continuation;
                    _hold = GCHandle.Alloc(_continuation);
                }
            }

            // IDispose
            public void Dispose()
            {
                lock (this)
                {
                    _continuation = null;
                    ReleaseHold();
                }
            }
        }

        // p/invoke
        delegate void WaitOrTimerCallbackProc(IntPtr lpParameter, bool TimerOrWaitFired);

        [DllImport("kernel32.dll")]
        static extern bool CreateTimerQueueTimer(out IntPtr phNewTimer,
            IntPtr TimerQueue, WaitOrTimerCallbackProc Callback, IntPtr Parameter,
            uint DueTime, uint Period, uint Flags);

        [DllImport("kernel32.dll")]
        static extern bool DeleteTimerQueueTimer(IntPtr TimerQueue, IntPtr Timer,
            IntPtr CompletionEvent);
    }
}
noseratio
  • 59,932
  • 34
  • 208
  • 486
  • 1
    Very nice! I did notice the `unbox.any` opcode in the state machine but didn't think much of it.. given that it is output by the C# compiler for everything. Nice work mate. – Simon Whitehead Mar 10 '14 at 04:32
  • 1
    “Note, there is no need to put a strong hold on the timer callback delegate” I don't think you should rely on this. The state machine certainly isn't guaranteed to do it for you. – svick Mar 18 '14 at 19:34
  • @svick, in the generated code (the 2nd code fragment here) `WaitOrTimerCallbackProc callback` is a field of the state machine. It doesn't get gc'ed until the `TestAsync` is complete, even if GC is enforced with `GC.Collect`. This makes sense to me, because it never goes out of scope inside `TestAsync`. This is as good as making `WaitOrTimerCallbackProc callback` a `static` variable, which I'd assign to `null` when `TestAsync` finishes. – noseratio Mar 18 '14 at 21:37
  • 1
    @Noseratio Yes, but you're relying on how exactly does the compiler generate the state machine and future/alternative versions of the compiler might do it differently. And I think that the specification doesn't require this object to be kept alive (just like any local variable that's not used after a certain point). So, I think you shouldn't rely on this behavior at all. – svick Mar 18 '14 at 21:42
  • @Noseratio `callback` references `awaiter`, but not the other way around. So, `callback` can get collected, which means the native code that attempts to call it can fail with access violation, or something like that. So, that `KeepAlive()` is very important here. – svick Mar 18 '14 at 22:54
  • @svick, I created a simple fiddle to model this local variable scenario: http://dotnetfiddle.net/fq5XPT. Could you explain why `callback` doesn't get GC'ed inside the `while` loop? – noseratio Mar 18 '14 at 23:04
  • @svick, now I think you're probably right, but I can't explain the GC behavior I'm seeing. – noseratio Mar 18 '14 at 23:19
  • 2
    @Noseratio It's not guaranteed to be GCed. You're right that it won't be with the current C# compiler, because that local is going to be turned into a field on the state machine type. But the compiler doesn't have to do that. – svick Mar 19 '14 at 02:15
  • @svick, you're right! This behavior of the state machine doesn't match the behavior of a similar non-async method: http://stackoverflow.com/q/22495819/1768303. I've updated the answer to address this. Thanks for your input and persistence! :) – noseratio Mar 19 '14 at 08:15
  • 3
    @Noseratio Reading this question and answer for the second time and its absolutely amazing. One for the `async-await` pantheon! – Yuval Itzchakov Nov 16 '14 at 16:11