1

All!

I'm debugging one quite strange case of process hanging/running out of memory using standard Windows crash dump with WinDbg. Obviously, it runs out of address space because of too many threads being created (it is 32 bit process), and I'm trying to figure out what's wrong with threads initialization (see callstack #3 below), because besides threads with callstacks that are typical for this program, it has handful of threads with callstacks of 3 types like:

1)

00 02cefb08 77544413 02fc024c 00000000 02cefb8c ntdll_774f0000!NtWaitForAlertByThreadId+0xc
01 02cefb28 7754434d 00000000 00000000 ffffffff ntdll_774f0000!RtlpWaitOnAddressWithTimeout+0x33
02 02cefb6c 7754423f 00000004 00000000 00000000 ntdll_774f0000!RtlpWaitOnAddress+0xa5
03 02cefba8 7752a605 02fc0000 02fc0000 02fc04b0 ntdll_774f0000!RtlpWaitOnCriticalSection+0xaa
04 02cefbc8 7752a525 02fc0248 02cefc88 77533844 ntdll_774f0000!RtlpEnterCriticalSectionContended+0xd5
05 02cefbd4 77533844 02fc0248 62da3da7 02fc04b0 ntdll_774f0000!RtlEnterCriticalSection+0x45
06 02cefc88 77533688 02fc04b0 02fc04b8 00000007 ntdll_774f0000!RtlpFreeHeap+0x174
07 02cefcd8 110d27fc 02fc0000 00000000 02fc04b8 ntdll_774f0000!RtlFreeHeap+0x758
...

These threads are stuck behind critical section 02fc024c that is taken by non-longer existing thread, and it is quite hard to figure out, what happened to it.

There are some threads that try to end normally, but are stuck in the LdrpDrainWorkQueue:

2)

 # ChildEBP RetAddr  Args to Child              
00 05e5fd54 77527631 00000064 00000000 00000000 ntdll_774f0000!NtWaitForSingleObject+0xc
01 05e5fd78 7752b105 65f13f5f 00404e7c 00000000 ntdll_774f0000!LdrpDrainWorkQueue+0xbd
02 05e5fe70 7755179c 00404e7c 00404e7c 1086eb50 ntdll_774f0000!LdrShutdownThread+0x85
03 05e5ff40 00404efe 00000000 0042cef4 0042cefc ntdll_774f0000!RtlExitUserThread+0x4c
04 05e5ff6c 00404ea6 05e5ffcc 004049b8 05e5ff80 abc!EndThread+0x6
05 05e5ff80 743962c4 1086eb50 743962a0 941a355e abc!ThreadWrapper+0x2a
06 05e5ff94 77550779 1086eb50 65f13ef3 00000000 kernel32!BaseThreadInitThunk+0x24
07 05e5ffdc 77550744 ffffffff 77573606 00000000 ntdll_774f0000!__RtlUserThreadStart+0x2f
08 05e5ffec 00000000 00404e7c 1086eb50 00000000 ntdll_774f0000!_RtlUserThreadStart+0x1b

Also, dump presents about 1400 threads on a very early stage of initialization, that were created during last 5 minutes of process life with a callstack like:

3)

 # ChildEBP RetAddr  Args to Child              
00 0ed1fba4 77527631 00000064 00000000 00000000 ntdll_774f0000!NtWaitForSingleObject+0xc
01 0ed1fbcc 7752b586 6ec53d9b ffffffff 1ed9d000 ntdll_774f0000!LdrpDrainWorkQueue+0xbd
02 0ed1fcb4 77557d86 6ec53c27 00000000 00000000 ntdll_774f0000!LdrpInitializeThread+0x8d
03 0ed1fd08 77557ce0 00000000 00000000 0ed1fd24 ntdll_774f0000!_LdrpInitialize+0x6a
04 0ed1fd10 00000000 0ed1fd24 774f0000 00000000 ntdll_774f0000!LdrInitializeThunk+0x10

These threads are also waiting in the LdrpDrainWorkQueue for event LdrpLoadCompleteEvent to be signalled. This event is related to parallel loader (for some reference, first answer fot his question from RbMm, somewhat similar yet different situation here) This event is created during process initialization and signalled after parallel DLL loading has finished, so all LdrpInitializeThread's could traverse DllMain's and signal THREAD_ATTACH. But I don't understand why it is in non-signalled state on a process that has been running for weeks? Does parallel loader work on LoadLibrary as well, so LdrpLoadCompleteEvent gets reset? Couldn't find it in disassembly.

In any case, I'm trying to understand why process has developed such strange callstacks before it was forcefully terminated. I could imagine, that some thread began loading DLL that caused LdrpLoadCompleteEvent to be reset, then some thread holding lock for the heap died in a bad manner, so dll loading couldn't have been completed, so LdrpLoadCompleteEvent was never signalled, hence no new threads could have been initialized. However, there's no any thread that is loading dll in the dump.

Any insight/hint regarding how such callstacks could have been developed, or what else I could do to squeeze more info from the dump, is welcome.

Thank you!

  • Did you manage to solve this issue? I’m having the same problem where a thread is waiting for `LdrpLoadCompleteEvent` but it’s not getting set for some reason. `LdrpWorkInProgress` is 1 and the thread with `LoadOwner` set is waiting for `LdrpLoaderLock`. – Adam Schauer Dec 19 '22 at 21:45

1 Answers1

-2

Your fundamental problem is architectural ... a legendary problem known as "thrashing."

Your system is probably designed with what I refer to as the "flaming-arrow approach." Whenever a new request comes in, "just light another flaming arrow and throw it into the air." Unfortunately you just can't do that.

The permanent solution to your problem unfortunately will never be solved by a debugger: it will require a redesign.

Mike Robinson
  • 8,490
  • 5
  • 28
  • 41
  • This note is absolutely, outrageously unrelated to the nature of problem described (this issue can happen for applications not processing any requests, with only couple of threads, see link #2 for example). Please refrain from off-topic notes. – Aleksey Ivchenko Aug 09 '21 at 03:36