8

I'm facing with a .Net server application, which crashes on an almost weekly basis on a problem in a "GC Finalizer Thread", more exactly at line 798 of "mscorlib.dll ...~DestroyScout()", according to Visual Studio.

Visual Studio also tries to open the file "DynamicILGenerator.gs". I don't have this file, but I've found a version of that file, where line 798 indeed is inside the destructor or the DestroyScout (whatever this might mean).

I have the following information in my Visual Studio environment:

Threads :

Not Flagged >   5892    0   Worker Thread   GC Finalizer Thread mscorlib.dll!System.Reflection.Emit.DynamicResolver.DestroyScout.~DestroyScout

Call stack:

    [Managed to Native Transition]  
>   mscorlib.dll!System.Reflection.Emit.DynamicResolver.DestroyScout.~DestroyScout() Line 798   C#
[Native to Managed Transition]  
kernel32.dll!@BaseThreadInitThunk@12()  Unknown
ntdll.dll!__RtlUserThreadStart()    Unknown
ntdll.dll!__RtlUserThreadStart@8()  Unknown

Locals (no way to be sure if that $exception object is correct):

+       $exception  {"Exception of type 'System.ExecutionEngineException' was thrown."} System.ExecutionEngineException
    this    Cannot obtain value of the local variable or argument because it is not available at this instruction pointer,
            possibly because it has been optimized away.    System.Reflection.Emit.DynamicResolver.DestroyScout
    Stack objects   No CLR objects were found in the stack memory range of the current frame.   

Source code of "DynamicILGenerator.cs", mentioning the DestroyScout class (line 798 is mentioned in comment):

    private class DestroyScout
    {
        internal RuntimeMethodHandleInternal m_methodHandle;

        [System.Security.SecuritySafeCritical]  // auto-generated
        ~DestroyScout()
        {
            if (m_methodHandle.IsNullHandle())
                return;

            // It is not safe to destroy the method if the managed resolver is alive.
            if (RuntimeMethodHandle.GetResolver(m_methodHandle) != null)
            {
                if (!Environment.HasShutdownStarted &&
                    !AppDomain.CurrentDomain.IsFinalizingForUnload())
                {
                    // Somebody might have been holding a reference on us via weak handle.
                    // We will keep trying. It will be hopefully released eventually.
                    GC.ReRegisterForFinalize(this);
                }
                return;
            }

            RuntimeMethodHandle.Destroy(m_methodHandle); // <===== line 798
        }
    }

Watch window (m_methodHandle):

m_methodHandle  Cannot obtain value of the local variable or argument because 
                it is not available at this instruction pointer,
                possibly because it has been optimized away.
                System.RuntimeMethodHandleInternal

General dump module information:

Dump Summary
------------
Dump File:  Application_Server2.0.exe.5296.dmp : C:\Temp_Folder\Application_Server2.0.exe.5296.dmp
Last Write Time:    14/06/2022 19:08:30
Process Name:   Application_Server2.0.exe : C:\Runtime\Application_Server2.0.exe
Process Architecture:   x86
Exception Code: 0xC0000005
Exception Information:  The thread tried to read from or write to a virtual address
                        for which it does not have the appropriate access.
Heap Information:   Present

System Information
------------------
OS Version: 10.0.14393
CLR Version(s): 4.7.3920.0

Modules
-------
Module Name                                           Module Path   Module Version
-----------                                           -----------   --------------
...
clr.dll     C:\Windows\Microsoft.NET\Framework\v4.0.30319\clr.dll       4.7.3920.0
...

Be aware: the dump arrived on a Windows-Server 2016 computer, I'm investigating the dump on my Windows-10 environment (don't be mistaking on OS Version in the dump summary)!

Edit

What might the destroyscout be trying to destroy? That might be very interesting.

Dominique
  • 16,450
  • 15
  • 56
  • 112
  • to me it seems like a race condition in a multithreaded scenario, where multiple threads dispose of the same object handle – Michael Schönbauer Jun 15 '22 at 13:15
  • A race condition? In a piece of source I don't have access to? Any way to get this solved (is this a known bug, is it possible to follow the progress of it, ...)? – Dominique Jun 15 '22 at 13:17
  • looking again, i think it has nothing to do with mulithread, but it obviously seems like a bug yes. reregistering `this` for finalize within the destructor might result in the GC calling the destructor again.. what is this ?? thats a strange logic of destroying objects in c#, waiting until all weak references have given up their handle.. i have no idea, sorry for commenting – Michael Schönbauer Jun 15 '22 at 13:18
  • Have you tried using a newer version of .NET Framework? `ExecutionEngineException` indicates to me probably some kind of corrupted memory, which happens to only manifest at finalization. Are you using `unsafe` or PInvoke? – Charlieface Jun 15 '22 at 13:21
  • 2
    @MichaelSchönbauer: don't feel sorry for trying :-) – Dominique Jun 15 '22 at 13:46
  • @Charlieface: I just checked all source code. I have found three instances of `unsafe` but all of them are inside a piece of code which is not used here. `PInvoke` is never used. You mention upgrading my .NET framework. Imagine I would do that, how can I know which .NET framework solves this issue? – Dominique Jun 15 '22 at 13:48
  • The fact that `unsafe` is not used *here* doesn't mean it doesn't have a bug, it may be overwriting memory it shouldn't, but the effect only appears here. I'm not aware of this bug (if it is a bug) and can't find any documentation on it, just suggesting you try upgrade framework – Charlieface Jun 15 '22 at 13:50
  • You *might* have a reason to thoroughly review System.Reflection.Emit code in the codebase. But this is a memory corruption problem that can strike anywhere, anytime. Clearly the CLR version is badly outdated, one thing you never want to do with trouble like this is preventing stability and security updates from being deployed. – Hans Passant Jun 15 '22 at 13:55
  • @HansPassant: sorry for the long delay but I currently have a similar problem again. Again CLR version is mentioned to be 4.7.something, in this case 4.7.3946.0. You mention it being badly outdated, but I believe the CLR version not being part of the dumpfile, but being part of my own system, so it can't be related to the crash I'm facing. Am I correct? (Sorry for my ignorance) – Dominique Aug 22 '22 at 08:14
  • Look at the following code: https://referencesource.microsoft.com/#mscorlib/system/reflection/emit/dynamicilgenerator.cs,3bee9e4a662d474d I would say you've gone a bit overboard with dynamic IL generation. Try to ensure there's no new IL stuff during shutdown. – Hylaean Aug 22 '22 at 16:49
  • @Dominique: Did you already try different [GC modes](https://learn.microsoft.com/en-us/dotnet/core/runtime-config/garbage-collector) like server or workstation? Maybe this helps to narrow down the problem. – Fabian Aug 26 '22 at 09:11
  • @Fabian: Sorry, but I never heard of any garbage collection configuration. Do you have any idea which configuration setting might influence the behaviour I'm describing in my question? In my system I have found following entries: `` and ``. ` – Dominique Aug 26 '22 at 09:17
  • @Fabian: the answer of https://stackoverflow.com/users/16587692/teodor-mihail mentions GC optimalisation. Is there a setting which suppresses GC optimalisation? – Dominique Aug 26 '22 at 09:28
  • @Dominique: I am really no expert of the topic, but since the idea of a racing condition floated around I remembered, that there is a concurrent and a non-concurrent GC mode. From [here](https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/gcserver-element) I see however, that your settings already set the non-concurrent server garbage collection. Then again in [this article](https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/gcconcurrent-element) it states that the machine configuration file overrides the application config. – Fabian Aug 26 '22 at 09:29
  • Concerning the "optimized away": I do not think that the GC optimizes it away. This message refers to optimizations of release mode dlls that does not allow the debugger to find the value of the property. – Fabian Aug 26 '22 at 09:33
  • @Fabian: do you have any idea where I might find the machine's configuration? (I tried doing a search for the setting in all files of the machine, but seems to be a bad idea :-) ). Or is it somewhere in the registry? – Dominique Aug 26 '22 at 09:37
  • Concerning the GC settings. Please check if the [Machine.Config](https://stackoverflow.com/questions/2325473/where-is-machine-config) has a gcConcurrentSetting. This will override the application.config settings. – Fabian Aug 26 '22 at 09:38
  • @Fabian: I have found four Machine.config files. None of them contained any "gcCon..." entry. – Dominique Aug 26 '22 at 09:41
  • @Fabian: I think we can conclude that my machine is set NOT to be GC-concurrent. Any way this might cause the issue I'm having here? (Sorry for my ignorance but as stated before I never heard of GC configuration before) – Dominique Aug 26 '22 at 09:46
  • @Charlieface: what do you mean by using `unsafe` or `PInvoke`? (Sorry for my ignorance, but I have no idea what you're talking about.) – Dominique Aug 26 '22 at 09:49
  • @Dominique: You could try the other combinations of the gcServer and gcConcurrent. But the problem may very well be unrelated to the GC Settings. – Fabian Aug 26 '22 at 09:54
  • @Fabian: Hmm, I can't do trial-and-error: the issue happens on a customer system and the problem seems to occur randomly: the customer won't agree and even if the customer would agree, I would not know when I can decide that a trial is successful or not (the last crash happened more than two months ago). – Dominique Aug 26 '22 at 10:02
  • [`unsafe`](https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/unsafe) is a C# keyword, and means you get to muck around with pointers. Using PInvoke means you are caling into native APIs using the [`[DllImport]` attribute](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.dllimportattribute?view=net-6.0). If you are using either of these you could be open to memory corruption if not done correctly. Once you get memory corruption it could manifest anywhere, the exact location is probably not actually relevant. The .NET version is also a concern – Charlieface Aug 26 '22 at 10:27
  • @Charlieface: I've investigated the entire code, the words `unsafe` and `PInvoke` are not present in the source code. – Dominique Aug 26 '22 at 11:24
  • You would be looking for `DllImport` not `PInvoke`. Again: have you tried upgrading the .NET version? A race condition is also a concern: be aware that a race condition that you create could corrupt memory you don't know about, for example if you access a function that is not thread-safe and cause a torn read/write. – Charlieface Aug 26 '22 at 11:39
  • @Charlieface: upgrading .Net version is not an update (the customer is very reluctant towards updates) and the only `DLLImport` inside the source code is the following line: `[DllImport("user32.dll")]`. – Dominique Aug 26 '22 at 13:56
  • After that, the external command `ShutdownBlockReasonCreate(...)` is mentioned.`. – Dominique Aug 26 '22 at 14:03

1 Answers1

1

I don't know what exactly is causing this crash, but I can tell you what DestroyScout does.

It's related to creating dynamic methods. The class DynamicResolver needs to clean up related unmanaged memory, which is not tracked by GC. But it cannot be cleaned up until there are definitely no references to the method anymore.

However, because malicious (or outright weird) code can use a long WeakReference which can survive a GC, and therefore resurrect the reference to the dynamic method after its finalizer has run. Hence DestroyScout comes along with its strange GC.ReRegisterForFinalize code in order to ensure that it's the last reference to be destroyed.

It's explained in a comment in the source code

// We can destroy the unmanaged part of dynamic method only after the managed part is definitely gone and thus
// nobody can call the dynamic method anymore. A call to finalizer alone does not guarantee that the managed 
// part is gone. A malicious code can keep a reference to DynamicMethod in long weak reference that survives finalization,
// or we can be running during shutdown where everything is finalized.
//
// The unmanaged resolver keeps a reference to the managed resolver in long weak handle. If the long weak handle 
// is null, we can be sure that the managed part of the dynamic method is definitely gone and that it is safe to 
// destroy the unmanaged part. (Note that the managed finalizer has to be on the same object that the long weak handle 
// points to in order for this to work.) Unfortunately, we can not perform the above check when out finalizer 
// is called - the long weak handle won't be cleared yet. Instead, we create a helper scout object that will attempt 
// to do the destruction after next GC.

As to your crash, this is happening in internal code, and is causing an ExecutionEngineException. This most likely happens when there is memory corruption, when memory is used in a way it wasn't supposed to be.

Memory corruption can happen for a number of reasons. In order of likelihood:

  • Incorrect use of PInvoke to native Win32 functions (DllImport and asscociated marshalling).
  • Incorrect use of unsafe (including library classes such as Unsafe and Buffer which do the same thing).
  • Multi-threaded race conditions on objects which the Runtime does not expect to be used multi-threaded. This can cause such problems as torn reads and memory-barrier violations.
  • A bug in .NET itself. This can be the easiest to exclude: just upgrade to the latest build.

Consider submitting the crash report to Microsoft for investigation.

Edit from the author:
In order to submit a crash report to Microsoft, the following URL can be used: https://www.microsoft.com/en-us/unifiedsupport. Take into account that this is a paying service and that you might need to deliver your entire source code Microsoft in order to get a full analysis of your crash dump.

Dominique
  • 16,450
  • 15
  • 56
  • 112
Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • I particularly love the idea where you propose to send the crash dump to Microsoft. Maybe they'll see something I didn't. – Dominique Aug 30 '22 at 10:13