51

With Visual Studio 2015, in a new, empty C++ project, build the following for Console application:

int main() {
    return 0;
}

Set a break point on the return and launch the program in the debugger. On Windows 7, as of the break point, this program has only one thread. But on Windows 10, it has five(!) threads: the main thread and four "worker threads" waiting on a synchronization object.

Who's starting up the thread pool (or how do I find out)?

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • Here comes the joke about Windows and endless loops... – Alexander Shishenko Jan 16 '16 at 00:24
  • Maybe processes get a thread pool by default on Windows 10. – Jonathan Potter Jan 16 '16 at 00:27
  • 1
    I'd start by putting a breakpoint on `CreateThread`. Note that placing breakpoints by name is very common using windbg, while in the Visual Studio debugger it's possible but requires learning some unusual menu commands. – Ben Voigt Jan 16 '16 at 00:28
  • Is it the trhads you observe within Visual Studio ? Or it it the threads that you can see (for example in [ProcessExplorer](https://technet.microsoft.com/en-us/sysinternals/processexplorer.aspx)) when you run your code directly from the command line ? – Christophe Jan 16 '16 at 00:32
  • @Christophe: I'm observing with the Threads window in Visual Studio. – Adrian McCarthy Jan 16 '16 at 00:33
  • @AdrianMcCarthy it would then be interesting to see how it is with running the release version from the command line, without the overhead of the debugger. – Christophe Jan 16 '16 at 00:40
  • 1
    @Christophe: Are you suggesting that the Visual Studio debugger is injecting a threadpool into the process under test, but only on Windows 10? – Adrian McCarthy Jan 16 '16 at 00:47
  • For one, I know that WinSock2 on at least MS Windows XP created a thread, probably used internally. Other libs might do the same. – Ulrich Eckhardt Jan 16 '16 at 08:20
  • Are you using VS 2015 on both Windows 7 and Windows 10? Have you applied/not applied VS 2015 Update 1 on both? – Isaac Jan 16 '16 at 11:02
  • @Isaac: Yes, both machines are using VS 2015 Update 1. – Adrian McCarthy Jan 16 '16 at 15:23
  • @Ulrich Eckhardt: This is a bare minimum program that doesn't include WinSock2 or any other libraries other than what the compiler needs from the language run-time libraries. Perhaps the new universal run-time libraries are doing something different on Windows 10 than on older versions. – Adrian McCarthy Jan 16 '16 at 15:31

3 Answers3

43

Crystal ball says that the Debug > Windows > Threads window shows these threads at ntdll.dll!TppWorkerThread. Be sure to enable the Microsoft Symbol Server to see this yourself, use Tools > Options > Debugging > Symbols.

This also happens in VS2013 so it is most definitely not caused by the new VS2015 diagnostic features, @Adam's guess cannot be correct.

TppWorkerThread() is the entrypoint for a thread-pool thread. When I set a breakpoint with Debug > New Breakpoint > Function Breakpoint on this function. I got lucky to capture this stack trace for the 1st threadpool thread when the 2nd threadpool thread started executing:

    ntdll.dll!_NtOpenFile@24()  Unknown
    ntdll.dll!LdrpMapDllNtFileName()    Unknown
    ntdll.dll!LdrpMapDllSearchPath()    Unknown
    ntdll.dll!LdrpProcessWork() Unknown
    ntdll.dll!_LdrpWorkCallback@12()    Unknown
    ntdll.dll!TppWorkpExecuteCallback() Unknown
    ntdll.dll!TppWorkerThread() Unknown
    kernel32.dll!@BaseThreadInitThunk@12()  Unknown
    ntdll.dll!__RtlUserThreadStart()    Unknown
>   ntdll.dll!__RtlUserThreadStart@8()  Unknown

Clearly the loader is using the threadpool on Windows 10 to load DLLs. That's certainly new :) At this point the main thread is also executing in the loader, concurrency at work.

So Windows 10 is taking advantage of multiple cores to get the process initialized faster. Very much a feature, not a bug :)

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • The question is, does Windows 10 /new Windows versions take the liberty to use extra threads beyond process initialization? some program actually depends on certain amount of threads for performance (e.g. web server).. – David Haim Jan 16 '16 at 13:23
  • 1
    Clearly the answer is *yes*. The threadpool gets lots of other use on a web server, these are not "wasted" threads. I tested this on the workstation version of Windows 10 btw, Windows Server 2016 is still only in preview right now. – Hans Passant Jan 16 '16 at 13:25
  • 3
    my question is : let's say I've profiled my server and I saw that 5 threads brings the most performance. Do new versions of Windows can take the liberty to use more threads from the ones I have already created? I;m not asking the use of a threadpool on web servers, but the issue yo uare presenting here, which is "unwanted os threads in my program" – David Haim Jan 16 '16 at 13:28
  • Interesting. But doesn't it seem odd that the loader doesn't close the threadpool it created once the process is running? Are they kept around because the application might call LoadLibrary later and thus require more work from the loader? – Adrian McCarthy Jan 16 '16 at 15:52
  • 1
    @Adrian that's a general thread pool that other code (including your own) could use as well. There's no reason to shut it down – Voo Jan 16 '16 at 20:28
  • 3
    Note that Windows is entitled to create as many threads as it pleases. If your program depends on there being no threads that you did not create yourself, that's a bug. – Harry Johnston Jan 17 '16 at 00:37
  • 1
    @Harry Johnston: I didn't say there's such a dependency. – Adrian McCarthy Mar 07 '16 at 23:48
  • 12
    @Voo: One of the big benefits to using native code is to not pay for resources you don't use. If my application doesn't need the thread pool, it seems odd that it still has to pay for four threads worth of stack space. It wouldn't surprise me if I were using a framework with a big runtime system. But even the simplest of programs now spins up multiple threads even if it never uses them. – Adrian McCarthy Mar 07 '16 at 23:51
  • @AdrianMcCarthy: I was just making a general observation. David (for example) seemed concerned that the extra threads would affect his profiling, though I suspect that isn't true. As for the stack space, I don't think that will significantly affect the amount of per-process overhead, as I believe Windows already has quite a lot. (Perhaps I'm mistaken.) – Harry Johnston Mar 08 '16 at 01:01
  • 3
    It's a BUG! ExitThread() on all user-created threads and main no longer causes the process to terminate; this is contractual behavior. – Joshua Jul 26 '17 at 15:35
  • 3
    @Joshua: No, it's not. It never was. – conio Sep 29 '17 at 11:38
  • 3
    Agree with @AdrianMcCarthy, as the developer I should at least be able to disable functionality like this. To use 5 threads in an app like this doesn't seem smart, even if an app like this is unlikely. But here's another reason why it's a bad idea - currently dealing with an app that is crashing in the Windows threadpool (the app isn't using thread pools), and the examined crash dump doesn't actually reveal where the problem is. The code I have control over is not crashing, but the Windows thread pool is (only on Win10, and only sometimes). It's close to impossible to determine the root cause. – Lucky Luke Jun 19 '18 at 19:26
  • 6
    Later an article was written referencing this answer, with detailed description of the loader thread pool: https://threatvector.cylance.com/en_us/home/windows-10-parallel-loading-breakdown.html – Suma Aug 05 '18 at 07:17
  • 1
    Related Raymond Chen (Old New Thing) post: https://devblogs.microsoft.com/oldnewthing/20191115-00/?p=103102 – Adrian McCarthy Nov 15 '19 at 17:25
3

It's the default thread pool. https://learn.microsoft.com/en-us/windows/desktop/procthread/thread-pools

Every process has a default thread pool.

Changming Sun
  • 857
  • 2
  • 7
  • 19
  • 7
    Sure, but, before Windows 10, if the program didn't use the thread pool, then no extra threads were created. As of Windows 10, all programs now pay the cost of starting several threads (compute and memory) even if they don't need them. – Adrian McCarthy Aug 06 '18 at 13:04
  • 2
    You can disable this functionality if you really need to, see Suma's link in the comments to the accepted answer. But the cost is minimal (a few extra threads sitting in a wait state for the first thirty seconds) so I suspect that in most cases the faster load times will more than make up for that. – Harry Johnston Aug 12 '18 at 21:54
0

This intrigued me also, so I decided to find my personal answer; As another poster says, its a bit of a "crystal ball" endeavour, but...

The probable cause is one of your threads called either:

  • WaitForSingleObject or
  • WaitForMultipleObjects

The implementation of this in the latest versions of Windows seems to spawn a thread pool to facilitate waiting for objects (don't know why).

This might also possibly be happening before your main because you have some code which causes a global scoped object to be created which then starts off code before you even hit your entry point (this may even be in some standard library code for Windows 10 SDK).

For anyone wanting to find out their own SPECIFIC cause, you can TRY this:

class RunBeforeMain
{
public:
    RunBeforeMain()
    {
        HMODULE hNtDll = (HMODULE)LoadLibrary(_T("ntdll.dll"));
        FARPROC lpNeeded = GetProcAddress(hNtDll,"NtWaitForMultipleObjects");
        DebugBreakPoint();
    }
};

RunBeforeMain go;

int CALLBACK WinMain(
  _In_ HINSTANCE hInstance,
  _In_ HINSTANCE hPrevInstance,
  _In_ LPSTR     lpCmdLine,
  _In_ int       nCmdShow
)
{
}

When you run this, you will get the library load location for NtDll procedure NtWaitForMultipleObjects in lpNeeded, grab that address and paste it into the disassembly view window then place a breakpoint on the first line.

Now continue running your solution.

Couple of caveats:

  1. We can't effectively control the initialisation order of globals, this is why if you've got good sense coding you avoid them at all costs (unless there is some exceptional need). Due to this fact, we can't guarantee our global will trigger before whatever other global causes additional threads.
  2. Whilst this is before main, the DLL loads of any libraries will proceed any of our calls, therefore, it might be already too late (you can use hacks like forcing no auto loading of libraries but that's way beyond my level of willingness to care here lol).

Hope this helps someone :)

0xC0000022L
  • 20,597
  • 9
  • 86
  • 152
KVKConsultancy
  • 145
  • 1
  • 4