My attempt
I created a minimal, CRT-free, dependency-depleted executable with Microsoft Visual Studio by specifying the /GS-
compiler flag and the /NoDefaultLib
linker flag, and naming the main function mainCRTStartup
. The application does not create additional threads and returns from mainCRTStartup
after < 5 seconds, but it takes 30 seconds in total for the process to terminate.
Problem description
From my experience, if an application, executed on Windows 10, only depends on dynamic libraries that are loaded by default into every Windows process, namingly ntdll.dll
, KernelBase.dll
and kernel32.dll
, the process exits normally when the main thread returns from the mainCRTStartup
function.
If other libraries are loaded, statically or dynamically (f. e. by calling LoadLibraryW
), returning from the main function will leave the process alive: for 30 seconds when run normally and indefinitely when run under a debugger.
Context
On process creation, the Windows 10 process loader creates additional threads to load dynamic libraries faster, see:
- Why does Windows 10 start extra threads in my program?
- Why there are three unexpected worker threads when a Win32 console application starts up?
Cylance mentions in Windows 10 Parallel Loading Breakdown:
The worker thread idle timeout is set to 30 seconds. Programs which execute in less than 30 seconds will appear to hang due to
ntdll!TppWorkerThreadwaiting
for the idle timeout before the process terminates.
Microsoft mentions in Terminating a Process: How Processes are Terminated:
Note that some implementation of the C run-time library (CRT) call ExitProcess if the primary thread of the process returns.
On the other hand, Microsoft mentions in ExitProcess
:
Note that returning from the main function of an application results in a call to
ExitProcess
.
Test code
This is the minimal test code I worked with, I used kernel32!CloseHandle
and user32!CloseWindow
as examples, the call to them does not actually do anything:
#include <cstdint>
namespace windows {
typedef const intptr_t Handle;
typedef const void * Module;
constexpr Handle InvalidHandleValue = -1;
namespace kernel32 {
extern "C" uint32_t __stdcall CloseHandle(Handle);
extern "C" uint32_t __stdcall FreeLibrary(Module);
extern "C" Module __stdcall LoadLibraryW(const wchar_t *);
}
namespace user32 {
extern "C" uint32_t __stdcall CloseWindow(Handle);
}
}
int mainCRTStartup() {
// 0 seconds
// windows::kernel32::CloseHandle(windows::InvalidHandleValue);
// 30 seconds
// windows::user32::CloseWindow(windows::InvalidHandleValue);
// 0 seconds
// windows::kernel32::FreeLibrary(windows::kernel32::LoadLibraryW(L"kernel32.dll"));
// 30 seconds
// windows::kernel32::FreeLibrary(windows::kernel32::LoadLibraryW(L"user32.dll"));
// 0 seconds
// windows::kernel32::FreeLibrary(windows::kernel32::LoadLibraryW(L""));
return 0;
}
Debugging
Commenting in the WinAPI usage in the the mainCRTStartup
function results in execution times mentioned above the respective WinAPI call.
This is the execution flow of the program traced in a debugger in pseudo C++:
ntdll.RtlUserThreadStart() {
kernel32.BaseThreadInitThunk() {
const auto return_code = test.mainCRTStartup();
ntdll.RtlExitUserThread(return_code) {
if (ntdll.NtQueryInformationThread(CURRENT_THREAD, ThreadAmILastThread) != STATUS_SUCCESS || !AmILastThread) {
// Bad path - for `30 seconds`.
ntdll.LdrShutdownThread();
ntdll.TpCheckTerminateWorker(0);
ntdll.NtTerminateThread(0, return_code);
// The thread execution does not return from `NtTerminateThread`, but the process still runs.
} else {
// Good path - for `0 seconds`.
ntdll.RtlExitUserProcess(return_code) {
ntdll.EtwpShutdownPrivateLoggers();
ntdll.LdrpDrainWorkQueue(0);
ntdll.LdrpAcquireLoaderLock();
ntdll.RtlEnterCriticalSection(ntdll.FastPebLock);
ntdll.RtlLockHeap(peb.ProcessHeap);
ntdll.NtTerminateProcess(0, return_code);
ntdll.RtlUnlockProcessHeapOnProcessTerminate();
ntdll.RtlLeaveCriticalSection(ntdll.FastPebLock);
ntdll.RtlReportSilentProcessExit(CURRENT_PROCESS, return_code);
ntdll.LdrShutdownProcess();
ntdll.NtTerminateProcess(CURRENT_PROCESS, return_code);
// The thread execution does not return from `NtTerminateProcess` and the process is terminated.
}
}
}
}
}
Expected results
I expected the process to terminate if it does not create additional threads and returns from the main function.
Calling ExitProcess
at the end of the main function terminates the process, even if WinAPI is called which resulted in 30 seconds execution before. Using this API is not always possible, because the problematic application might not be mine, but a 3rd party application (from Microsoft) like here: Why would a process hang within RtlExitUserProcess/LdrpDrainWorkQueue?
It seems to me that the Windows 10 process loader is broken, if even Microsoft processes behave incorrectly.
- Is there a clean solution to this problem?
- What are those loader threads needed for, if the last user created thread exits? AFAIK it is impossible at this point to load any other libraries.