5

To debug a locked file problem, we're calling SysInternal's Handle64.exe 4.11 from a .NET process (via Process.Start with asynchronous output redirection). The calling process hangs on Process.WaitForExit because the Handle64 process doesn't exit (for more than two hours).

We took a dump of the corresponding Handle64 process and checked it in the Visual Studio 2017 debugger. It shows two threads ("Main Thread" and "ntdll.dll!TppWorkerThread").

Main thread's call stack:

ntdll.dll!NtWaitForSingleObject ()  Unknown
ntdll.dll!LdrpDrainWorkQueue()  Unknown
ntdll.dll!RtlExitUserProcess()  Unknown
kernel32.dll!ExitProcessImplementation  ()  Unknown
handle64.exe!000000014000664c() Unknown
handle64.exe!00000001400082a5() Unknown
kernel32.dll!BaseThreadInitThunk    ()  Unknown
ntdll.dll!RtlUserThreadStart    ()  Unknown

Worker thread's call stack:

ntdll.dll!NtWaitForSingleObject()   Unknown
ntdll.dll!LdrpDrainWorkQueue()  Unknown
ntdll.dll!LdrpInitializeThread()    Unknown
ntdll.dll!_LdrpInitialize() Unknown
ntdll.dll!LdrInitializeThunk()  Unknown

My question is: Why would a process hang in LdrpDrainWorkQueue? From https://stackoverflow.com/a/42789684/62838, I gather that this is the Windows 10 parallel loader at work, but why would it get stuck while exiting the process? Can this be caused by how we invoke Handle64 from another process? I.e., are we doing something wrong or is this rather a bug in Handle64?

Fabian Schmied
  • 3,885
  • 3
  • 30
  • 49
  • are this is only 1 worker thread in handle64 ? i guess that more and with different call stacks. are this hung is always on your system or random ? – RbMm Oct 04 '18 at 17:04
  • i even can say that this is callstack not of `LoaderWorker` thread - it never call `LdrpDrainWorkQueue` from `LdrpInitializeThread` because this threads have special, very light initializations – RbMm Oct 04 '18 at 17:21
  • 1
    what i can say - at time when `ExitProcess` called - some *DLL* is loaded in another thread, the call-stack of worked thread - this is 100% not loader worker thread, but some another. both, main and this new thread which only just begin execute (at very early stage) wait on `LdrpLoadCompleteEvent` event. this event is set in single place `LdrpDropLastInProgressCount`, when load of some *DLL* complete. must be additional treads in process. here need more look in debugger. as side note, easy possible not use handle64 at all, but get all info yourself – RbMm Oct 04 '18 at 19:07
  • @RbMm Thanks a lot for the analysis! According to the debugger there was only that one worker thread in Handle64.exe when the dump was taken. (Maybe the other threads have already exited?) – Fabian Schmied Oct 05 '18 at 05:56
  • It doesn't occur always, we've seen it only once so far in a few dozens of calls. – Fabian Schmied Oct 05 '18 at 05:56
  • "as side note, easy possible not use handle64 at all, but get all info yourself" - well, we're not really system programmers, and this instrumentation was only added to diagnose another problem, so it would probably be a lot of effort and complexity for us to implement handle inspection ourselves, wouldn't it? – Fabian Schmied Oct 05 '18 at 05:58
  • you say that only 2 threads in process at this point ? very strange. interesting of course look for this under debugger, but problem catch such situation. enumerate handles in system, get info about it, etc - very simply task. here only problem if you need handles from protected processes - here need driver support (handle64 use *PROCEXP152.SYS*) *diagnose another problem* - what problem ? – RbMm Oct 05 '18 at 09:36
  • "what problem?" - in our system, deleting files programmatically sometimes doesn't work due to different errors (cannot delete a directory because a file is in use, cannot delete a directory because it isn't empty, cannot create a file due to unauthorized access), but it works if retried a few seconds later. We suspected some other process might be locking them, so we tried to invoke Handl64.exe as a quick way of determining if that was the case. But this is probably a topic for another question :) – Fabian Schmied Oct 09 '18 at 06:56
  • for concrete file you need use not Handl64.exe but [`FileProcessIdsUsingFileInformation`](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/ne-wdm-_file_information_class) – RbMm Oct 09 '18 at 07:53
  • anyway, not related to your real problem, interesting will be catch some hung and research it – RbMm Oct 09 '18 at 08:00

1 Answers1

1

How long did you wait?

According to this analysis,

The worker thread idle timeout is set to 30 seconds. Programs which execute in less than 30 seconds will appear to hang due to ntdll!TppWorkerThread waiting for the idle timeout before the process terminates.

I would recommend trying to set the registry key specified in that article to disable the parallel loader and see if this resolved the issue.

Parent Key: HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\handle64.exe
Value Name: MaxLoaderThreads
Type: DWORD
Value: 1 to disable
Dark Falcon
  • 43,592
  • 5
  • 83
  • 98
  • We waited for more than 2 hours, I'll update the question accordingly. – Fabian Schmied Oct 04 '18 at 14:52
  • We can't currently reliably reproduce this issue (it only occurred once in a few hundred calls so far), so disabling parallel loading to see if it resolves the issue is difficult. – Fabian Schmied Oct 04 '18 at 14:54
  • Without easy reproducibility, I would be tempted to just add a timeout to your `WaitForExit` and just give up on that process and try running it again. I think it unlikely you'll find anyone on SO who will be able to help you fix it. Good luck! – Dark Falcon Oct 04 '18 at 15:04
  • *Programs which execute in less than 30 seconds will appear to hang* this of course not true. worked thread run itself - not lead to hung. it not hold any "critical section" (in wide sense). if simply exit thread without call `ExitProcess` process of course not exit until this worker threads not exit, but this is not hung. and if we call `ExitProcess` - we normal exit – RbMm Oct 04 '18 at 16:58
  • @DarkFalcon "I would be tempted to just add a timeout to your WaitForExit" - That's exactly what we've done - this question is mostly for understanding the problem better. :) – Fabian Schmied Oct 05 '18 at 05:52