0

I'm having an issue with a roughly 15y WinForms application, which is being actively maintained. It's being used by around ten companies, each running it on around 5 - 20 clients. Therein we are using a commercial printer driver (PDF X-Change) to create PDFs from printouts, which employs an asynchronous callback mechanism to inform the application that the PDF has been created (or not). The application uses this information to display the PDF file in a window. Basically something like this:

        void pdfCreationStatusCallback(PdfCreationStatusArgs pcsa)
        {
            if (InvokeRequired)
            {
                Invoke(new PdfCreationStatus(pdfCreationStatusCallback), pcsa);
                return;
            }

            displayPdf(pcsa);
        }

A new company started using the software and reported that the application hangs from time to time. Upon debugging I noticed that call to Invoke() sometimes takes between 20 seconds to almost two hours to return! I used remote debugging in combination with .NET Framework debugging (with dotPeek) to see what happens under the hood. I was able to trace it all the way down to the following code in WaitHandle.cs:

    [SecurityCritical]
    internal static bool InternalWaitOne(
      SafeHandle waitableSafeHandle,
      long millisecondsTimeout,
      bool hasThreadAffinity,
      bool exitContext)
    {
      if (waitableSafeHandle == null)
        throw new ObjectDisposedException((string) null, Environment.GetResourceString("ObjectDisposed_Generic"));

      int num = WaitHandle.WaitOneNative(waitableSafeHandle, (uint) millisecondsTimeout, hasThreadAffinity, exitContext);

      if (AppDomainPauseManager.IsPaused)
        AppDomainPauseManager.ResumeEvent.WaitOneWithoutFAS();

      if (num == 128)
        WaitHandle.ThrowAbandonedMutexException();

      return num != 258;
    }

Here I noticed that on a "healthy" system the call to WaitHandle.WaitOneNative(...) returns 0, thus the method returns true and all is good. On the affected system WaitHandle.WaitOneNative(...) returns 258 (after 1000 ms, which is the value of millisecondsTimeout), thus the method returns false. The method calling InternalWaitOne(..) then calls it in a loop for as long as it takes for WaitHandle.WaitOneNative(...) to finally return a value other than 258 (which as stated above, can even take two hours). My problem is, that I couldn't find the source code to WaitOneNative(..), only the following declaration:

    [SecurityCritical]
    [MethodImpl(MethodImplOptions.InternalCall)]
    private static extern int WaitOneNative(
      SafeHandle waitableSafeHandle,
      uint millisecondsTimeout,
      bool hasThreadAffinity,
      bool exitContext);

When I set a breakpoint at the line int num = WaitHandle.WaitOneNative(..) I have two threads which both hit that spot, both returning num = 258:

This is the stacktrace for the PDF-creating callback function (_pdfPrinter_OnFileSaved):

>   mscorlib.dll!System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle waitableSafeHandle, long millisecondsTimeout, bool hasThreadAffinity, bool exitContext) Line 193 C#
    mscorlib.dll!System.Threading.WaitHandle.WaitOne(int millisecondsTimeout, bool exitContext) Line 125    C#
    System.Windows.Forms.dll!System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle waitHandle) Line 2788   C#
    System.Windows.Forms.dll!System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control caller, System.Delegate method, object[] args, bool synchronous) Line 4825   C#
    System.Windows.Forms.dll!System.Windows.Forms.Control.Invoke(System.Delegate method, object[] args) Line 4551   C#
    GUISubsystem.dll!GUISubsystem.CalculationForms.Printouts.PrintoutAnbotsartikel.FrmAnbotsartikelDruckParameter.pdfCreationStatusCallback(PdfCreationStatusArgs pcsa) Line 597    C#
    PrinterSubsystem.dll!PrinterSubsystem.PrintingManager._pdfPrinter_OnFileSaved(int jobID, string fileName) Line 885  C#
    [Übergang von Nativ zu Verwaltet]   
    [Übergang von Verwaltet zu Nativ]   
    mscorlib.dll!System.Delegate.DynamicInvokeImpl(object[] args) Line 113  C#
    mscorlib.dll!System.Runtime.InteropServices.ComEventsMethod.DelegateWrapper.Invoke(object[] args) Line 158  C#
    mscorlib.dll!System.Runtime.InteropServices.ComEventsMethod.Invoke(object[] args) Line 121  C#
    mscorlib.dll!System.Runtime.InteropServices.ComEventsSink.System.Runtime.InteropServices.NativeMethods.IDispatch.Invoke(int dispid, ref System.Guid riid, int lcid, System.Runtime.InteropServices.ComTypes.INVOKEKIND wFlags, ref System.Runtime.InteropServices.ComTypes.DISPPARAMS pDispParams, System.IntPtr pvarResult, System.IntPtr pExcepInfo, System.IntPtr puArgErr) Line 153 C#

This is the stacktrace of the other thread (looks like the main message loop):

>   mscorlib.dll!System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle waitableSafeHandle, long millisecondsTimeout, bool hasThreadAffinity, bool exitContext) Line 193 C#
    mscorlib.dll!System.Threading.WaitHandle.WaitOne(int millisecondsTimeout, bool exitContext) Line 125    C#
    System.Windows.Forms.dll!System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle waitHandle) Line 2793   C#
    System.Windows.Forms.dll!System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control caller, System.Delegate method, object[] args, bool synchronous) Line 4825   C#
    System.Windows.Forms.dll!System.Windows.Forms.Control.Invoke(System.Delegate method, object[] args) Line 4551   C#
    System.Windows.Forms.dll!System.Windows.Forms.WindowsFormsSynchronizationContext.Send(System.Threading.SendOrPostCallback d, object state) Line 74  C#
    System.dll!Microsoft.Win32.SystemEvents.SystemEventInvokeInfo.Invoke(bool checkFinalization, object[] args) Line 1272   C#
    System.dll!Microsoft.Win32.SystemEvents.RaiseEvent(bool checkFinalization, object key, object[] args) Line 970  C#
    System.dll!Microsoft.Win32.SystemEvents.OnUserPreferenceChanged(int msg, System.IntPtr wParam, System.IntPtr lParam) Line 728   C#
    System.dll!Microsoft.Win32.SystemEvents.WindowProc(System.IntPtr hWnd, int msg, System.IntPtr wParam, System.IntPtr lParam) Line 1130   C#
    [Übergang von Nativ zu Verwaltet]   
    [Übergang von Verwaltet zu Nativ]   
    System.Windows.Forms.dll!System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(System.IntPtr dwComponentID, int reason, int pvLoopData) Line 1258    C#
    System.Windows.Forms.dll!System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(int reason, System.Windows.Forms.ApplicationContext context) Line 2042  C#
    System.Windows.Forms.dll!System.Windows.Forms.Application.ThreadContext.RunMessageLoop(int reason, System.Windows.Forms.ApplicationContext context) Line 1982   C#
    System.Windows.Forms.dll!System.Windows.Forms.Application.Run(System.Windows.Forms.Form mainForm) Line 988  C#
    MainEntry.exe!MainEntry.Main(string[] args) Line 118    C#

An additional observation: it only takes this long the first time Invoke() (and thus WaitOneNative()) is called. Once WaitOneNative() returns 0 (instead of 258) the application runs without any delays. This is "reset" once the application is restarted. This is the first time and the first company that is experiencing this problem on 3/7 clients. One of the clients is using Windows 10, the other two Windows 8.1 with the .Net Framework 4.8. Our application is targeting .Net Framework 4.5.

Can anybody tell me where I can find the source code for static extern int WaitOneNative(..), such that I can further investigate what it's waiting for? And/or does anybody have a hint what might be going on here, ie. what WaitOnNative() could be waiting for between a few seconds up to hours? What should I be looking for in the stacktraces, threads, etc.?

Krys
  • 39
  • 9
  • The return value of 258 indicates timeout. – NetMage Nov 09 '20 at 21:55
  • https://referencesource.microsoft.com/#mscorlib/system/threading/waithandle.cs,38951ac89a5d75ca,references – stuartd Nov 09 '20 at 21:59
  • Have you tried to just `BeginInvoke(new Action(() => displayPdf(pcsa)));`, so you'll skip the call to `WaitForWaitHandle()` (since it's not a synchronous call). What is `displayPdf(pcsa)` doing? Just `Send`ing some info? Can this info be `Post`ed instead? – Jimi Nov 09 '20 at 22:42
  • It has nothing to do with MultiThreaded code. If the PDF file is on network share then it will take it's time while the network resource is discovered and ready to access and it only happens on first access. In this case all you can do is wait. Alternatively you can copy the file locally in a background thread before doing invoke for your PDF processing method. – vendettamit Nov 09 '20 at 22:42
  • @vendettamit The Invokation just calls `displayPdf(pcsa)`, which looks like a simple status update (`void pdfCreationStatusCallback(PdfCreationStatusArgs pcsa)`). It doesn't look like there any network access done in the UI Thread here. Of course the OP should specify what this method is actually doing. – Jimi Nov 09 '20 at 22:46
  • As `OnUserPreferenceChanged` is on the call stack, this might be relevant: https://stackoverflow.com/questions/4077822/net-4-0-and-the-dreaded-onuserpreferencechanged-hang – Klaus Gütter Nov 10 '20 at 06:59
  • Thank you for the link @stuartd, but unfortunately it only contains the declaration for the method, but not its implementation (see line 278). – Krys Nov 10 '20 at 09:47
  • @vendettamit: As @Jimi already noticed, it has nothing to do with network shares. At the point where `Invoke()` is called, the PDF-creation process has successfully finished and the file is already saved locally for further processing. `displayPdf(pcsa)` as the name indicates, only opens up a new form in which the PDF is displayed, thus as @Jimi concluded is a _simple_ status update. But it simply takes "ages" until `pdfCreationStatusCallback()` is actually invoked.... – Krys Nov 10 '20 at 09:48
  • ...I tried to substitute `Invoke()` with `BeginInvoke()`, but the only difference it made was that it was not blocking, `pdfCreationStatusCallback()` still would be called delayed (same goes for `BeginInvoke(new Action(() => displayPdf(pcsa)))`). – Krys Nov 10 '20 at 09:48
  • @klaus-gütter: thanks for the tip, I'll be looking into it just now and report back on my findings. – Krys Nov 10 '20 at 09:48
  • How is that even possible? Isn't the PDF *already created and saved locally*? Are the `PdfCreationStatusArgs` class properties doing *something* when first used? Otherwise, you should just use the properties values and show their content. Or is the PDF visualizer that takes time? You should give some more context to what's happening here. You're not creating UI elements (or anything UI-related) in worker Threads, right? – Jimi Nov 10 '20 at 10:10
  • @Jimi, you are right, the pdf is already created, but it seems my explanation wasn't very clear. I'll try another way. Here is the sequence of events: 1. Call to `PDFXChange.CreatePdf(callback: pdfCreationStatusCallback)`, which does its magic on a non UI-thread and when finished... 2. ...calls `pdfCreationStatusCallback()` 3. `pdfCreationStatusCallback()` recognizes that it's not on the UI thread, thus calls `Invoke(new Action(() => displayPdf(pcsa)))` (or `BeginInvoke(..)`) which enqueues `displayPdf(pcsa)` in the UI thread's message queue. – Krys Nov 10 '20 at 13:44
  • 4. `displayPdf()` gets called on the UI Thread almost immediately on _regular_ systems, but up to 1.5h later on _affected_ ones. Irrespective of whether `Invoke()` or `BeginInvoke()` is used, the request to call `displayPdf(pcsa)` is put in a loop which only proceeds once `WaitOneNative()` return a value != 258. The only difference between `Invoke()` and `BeginInvoke()` is that the former waits until that happens (and freezes the UI) while the other doesn't. But in both cases, `displayPdf(pcsa)` is only executed after a _long_ time (because `WaitOneNative()` apparently waiting for ... – Krys Nov 10 '20 at 13:44
  • ....something to finish, which is blocking execution of anything else, including `displayPdf()`). Thus, it's not even clear that it has anything to do with `pdfCreationStatusCallback()` or `displayPdf()`. But it's a spot in code where this almost always (but also not always!) happens when you first start the application. – Krys Nov 10 '20 at 13:45
  • That's also the reason why I'd be interested to see what `WaitOneNative()` does _exactly_, as to figure out why it's constantly telling the UI Thread that it's waiting for something. – Krys Nov 10 '20 at 13:47
  • ...except that, when you use `BeginInvoke()`, `WaitForWaitHandle()` is **never** called (because this is **not a synchronous call**), so `WaitOneNative()` is **never** called. Instead, it captures the current Execution Context, calls PostMessage() using a a `ThreadCallbackMessage` as the callback and return a `ThreadMethodEntry` as an `IAsyncResult`. That's why I ask whether you have other Threads there and which is the actual Thread Context where this code should resume, thus where the Control that `BeginInvoke`s is actually created. It looks like some Thread cannot find its way home... – Jimi Nov 10 '20 at 14:15
  • ... or whether the `PdfCreationStatusArgs` objects, which is created *elsewhere*, takes actions of any kind instead or just returning *values*. – Jimi Nov 10 '20 at 14:18
  • I saw some posts suggesting that `Invoke` will lock if the UI Thread is already locked - is it possible something is happening on the UI Thread causing it to hang? – NetMage Nov 10 '20 at 18:35

1 Answers1

2

I was able to identify the problem. It was one of those OnUserPreferenceChanged issues @klaus-gütter mentioned above. We had a Windows.Forms.Form being created one one of the background threads. The pdfCreationStatusCallback callback was simply one of those that would hang due to the "bug" (it wasn't itself the culprit), probably because it was the first one to be called after the background thread mentioned earlier. I debugged it using the method described here: link (...there are a lot of dead links on this topic).

What I don't really understand is why it eventually would be executed (after 1-2h), as well as why it only affected only 3-4 computers out of around 70 using the software, but I can live with that.

Many thanks to everybody who commented on the original post, especially @klaus-gütter and @Jimi: it helped a great deal to know what to look for.

Krys
  • 39
  • 9
  • 2
    As, among other, I'm here to learn and improve, I'd be happy to understand why my answer got downvoted. Could the one (or anybody else) make me aware of the problem with it? – Krys Nov 12 '20 at 14:41