4

My attempt

I created a minimal, CRT-free, dependency-depleted executable with Microsoft Visual Studio by specifying the /GS- compiler flag and the /NoDefaultLib linker flag, and naming the main function mainCRTStartup. The application does not create additional threads and returns from mainCRTStartup after < 5 seconds, but it takes 30 seconds in total for the process to terminate.

Problem description

From my experience, if an application, executed on Windows 10, only depends on dynamic libraries that are loaded by default into every Windows process, namingly ntdll.dll, KernelBase.dll and kernel32.dll, the process exits normally when the main thread returns from the mainCRTStartup function.

If other libraries are loaded, statically or dynamically (f. e. by calling LoadLibraryW), returning from the main function will leave the process alive: for 30 seconds when run normally and indefinitely when run under a debugger.

Context

On process creation, the Windows 10 process loader creates additional threads to load dynamic libraries faster, see:

Cylance mentions in Windows 10 Parallel Loading Breakdown:

The worker thread idle timeout is set to 30 seconds. Programs which execute in less than 30 seconds will appear to hang due to ntdll!TppWorkerThreadwaiting for the idle timeout before the process terminates.

Microsoft mentions in Terminating a Process: How Processes are Terminated:

Note that some implementation of the C run-time library (CRT) call ExitProcess if the primary thread of the process returns.

On the other hand, Microsoft mentions in ExitProcess:

Note that returning from the main function of an application results in a call to ExitProcess.

Test code

This is the minimal test code I worked with, I used kernel32!CloseHandle and user32!CloseWindow as examples, the call to them does not actually do anything:

#include <cstdint>

namespace windows {
    typedef const intptr_t Handle;
    typedef const void *   Module;

    constexpr Handle InvalidHandleValue = -1;

    namespace kernel32 {
        extern "C" uint32_t __stdcall CloseHandle(Handle);
        extern "C" uint32_t __stdcall FreeLibrary(Module);
        extern "C" Module   __stdcall LoadLibraryW(const wchar_t *);
    }

    namespace user32 {
        extern "C" uint32_t __stdcall CloseWindow(Handle);
    }
}

int mainCRTStartup() {
    // 0 seconds
    // windows::kernel32::CloseHandle(windows::InvalidHandleValue);

    // 30 seconds
    // windows::user32::CloseWindow(windows::InvalidHandleValue);

    // 0 seconds
    // windows::kernel32::FreeLibrary(windows::kernel32::LoadLibraryW(L"kernel32.dll"));

    // 30 seconds
    // windows::kernel32::FreeLibrary(windows::kernel32::LoadLibraryW(L"user32.dll"));

    // 0 seconds
    // windows::kernel32::FreeLibrary(windows::kernel32::LoadLibraryW(L""));

    return 0;
}

Debugging

Commenting in the WinAPI usage in the the mainCRTStartup function results in execution times mentioned above the respective WinAPI call.

This is the execution flow of the program traced in a debugger in pseudo C++:

ntdll.RtlUserThreadStart() {
    kernel32.BaseThreadInitThunk() {
        const auto return_code = test.mainCRTStartup();

        ntdll.RtlExitUserThread(return_code) {
            if (ntdll.NtQueryInformationThread(CURRENT_THREAD, ThreadAmILastThread) != STATUS_SUCCESS || !AmILastThread) {
                // Bad path - for `30 seconds`.

                ntdll.LdrShutdownThread();
                ntdll.TpCheckTerminateWorker(0);
                ntdll.NtTerminateThread(0, return_code);

                // The thread execution does not return from `NtTerminateThread`, but the process still runs.
            } else {
                // Good path - for `0 seconds`.

                ntdll.RtlExitUserProcess(return_code) {
                    ntdll.EtwpShutdownPrivateLoggers();
                    ntdll.LdrpDrainWorkQueue(0);
                    ntdll.LdrpAcquireLoaderLock();
                    ntdll.RtlEnterCriticalSection(ntdll.FastPebLock);
                    ntdll.RtlLockHeap(peb.ProcessHeap);
                    ntdll.NtTerminateProcess(0, return_code);
                    ntdll.RtlUnlockProcessHeapOnProcessTerminate();
                    ntdll.RtlLeaveCriticalSection(ntdll.FastPebLock);
                    ntdll.RtlReportSilentProcessExit(CURRENT_PROCESS, return_code);
                    ntdll.LdrShutdownProcess();
                    ntdll.NtTerminateProcess(CURRENT_PROCESS, return_code);

                    // The thread execution does not return from `NtTerminateProcess` and the process is terminated.
                }
            }
        }
    }
}

Expected results

I expected the process to terminate if it does not create additional threads and returns from the main function.

Calling ExitProcess at the end of the main function terminates the process, even if WinAPI is called which resulted in 30 seconds execution before. Using this API is not always possible, because the problematic application might not be mine, but a 3rd party application (from Microsoft) like here: Why would a process hang within RtlExitUserProcess/LdrpDrainWorkQueue?

It seems to me that the Windows 10 process loader is broken, if even Microsoft processes behave incorrectly.

  1. Is there a clean solution to this problem?
  2. What are those loader threads needed for, if the last user created thread exits? AFAIK it is impossible at this point to load any other libraries.
Maurice Kayser
  • 455
  • 4
  • 11
  • 2
    Yes, you must call ExitProcess yourself since you don't have the CRT doing it for you anymore. – Hans Passant Oct 25 '19 at 16:19
  • clean solution is call `ExitProcess` and where you view that something *broken* ? – RbMm Oct 25 '19 at 17:08
  • and you have big error here *returning from the **main** function* - here mean concrete **main** which called from *mainCRTStartup* and in the end of *mainCRTStartup* called `ExitProcess`. your code have not **main** – RbMm Oct 25 '19 at 17:12
  • @RbMm it seems broken to me because it works if I do not import any special libraries, and it also works on pre Windows 10. I do not understand what the loader threads are needed for if the last user created thread exits. – Maurice Kayser Oct 25 '19 at 17:33
  • @MauriceKayser - i can not understand what is concrete broken you think – RbMm Oct 25 '19 at 17:37
  • *if the last user created thread exits.* - loader threads - also user threads in your process. so last thread not exit. – RbMm Oct 25 '19 at 17:38
  • 1
    and in any case your caption is wrong - *returning from **main** ..* - you have not **main** function here. so and not return from it – RbMm Oct 25 '19 at 17:40
  • I do not know how else to rephrase the question – Maurice Kayser Oct 25 '19 at 17:41
  • nothing is broken. for process exit - need that all threads exit or call ExitProcess/Terminate process. this was always. and you can not assume are dlls loaded to process not create additional threads. never had such a guarantee. so you never (not only on windows 10) can know how many threads in your process. even if you not create it yourself. as result - only call ExitProcess or TerminateProcess can reliable exit process. so i not understand in what your question and problem – RbMm Oct 25 '19 at 18:17
  • *I do not know how else to rephrase the question* - question title must be *Returning from **exe entry point** does not terminate the process on Windows 10*. note that **exe entry point** != **main**. and i have counter question - why Returning from exe entry point **must** terminate the process ? – RbMm Oct 25 '19 at 18:31
  • I assumed the following: I am responsible for my threads, so I wait for them to exit before my main thread returns from `mainCRTStartup`. After returning, the loader will call all TLS-Callbacks in all DLLs, which are responsible for terminating their own threads. Then the process terminates after all non-standard DLLs are unloaded. This is what I expected, and what always worked for me in the past, before Windows 10 started injecting the DLL loader threads I mentioned. Why are they even necessary anymore when my last thread terminates? I am not able to load any DLL at that point. – Maurice Kayser Oct 28 '19 at 21:53
  • "After returning, the loader will call all TLS-Callbacks in all DLLs, which are responsible for terminating their own threads." There are no TLS callbacks. And even if there were, exiting the main thread does not free TLS. And even if it did, there is no such responsibility to terminate all threads when the TLS is freed. The threads you see are the system threadpool threads. The threadpool exists in every process. If you know that your process is finished, then just call ExitProcess. That is what triggers process shutdown and cleanup. – Raymond Chen Oct 28 '19 at 22:16

1 Answers1

0

I expected the process to terminate if it does not create additional threads and returns from the main function.

process can implicit create additional threads. loader for example. and need understanding what mean

returns from the main function

here mean function which called from standard CRT mainCRTStartup function. after this mainCRTStartup call ExitProcess. so not any exe entry real entry point function but some sub-function called from entry point. but entry point call ExitProcess than.

if we not use CRT - we need call ExitProcess yourself. if we simply return from from entry point - will be RtlExitUserThread which not call ExitProcess except this is last thread in process (AmILastThread) (and here also can be race if 2 or more threads in parallel call ExitThread)

RbMm
  • 31,280
  • 3
  • 35
  • 56
  • Now that I know it I see code examples like [this](https://hero.handmade.network/forums/code-discussion/t/94-guide_-_how_to_avoid_c_c++_runtime_on_windows) or [this](http://in4k.untergrund.net/various%20web%20articles/Creating_Small_Win32_Executables_-_Fast_Builds.htm) explicitly killing the process. I still wonder why the exe entry point returns a value in `eax`, which is used by the loader though.. – Maurice Kayser Nov 07 '19 at 13:11
  • @MauriceKayser - after **real** exe entry point return - it return value used in call `ExitThread` - if this is single/last thread at this moment - the whole process will come out. in the x86/x64 abi - *DWORD* value returned via *eax* – RbMm Nov 07 '19 at 13:18
  • Why does the loader do that if I am supposed to kill the process and never return from the exe entry point though? – Maurice Kayser Nov 07 '19 at 14:23
  • 1
    @MauriceKayser - return from exe entry point is legal. this mean that this thread must exit. but you can create additional threads in process. it can continue run. this is unusual but possible. – RbMm Nov 07 '19 at 14:40
  • @MauriceKayser Return from exe entry point will return to BaseThreadInitThunk, now follow the code in IDA. – Lewis Kelsey Apr 10 '21 at 19:19
  • @LewisKelsey - and then you can view call to `RtlExitUserThread` - this lead to process exit only in case - this is last thread in process – RbMm Apr 10 '21 at 20:32