std::thread runs A LOT slower than std::future

Question

I have some simple rendering program with a Mainloop that runs at about 8000 fps on one thread (it does nothing except draw a background) and I wanted to see if another thread rendering would upset the current context without changing it (it didn't to my surprise). I achieved this with this simple code here,

m_Thread = std::thread(Mainloop);
m_Thread.join();

and this code here somehow ran extremely slow, ~30 FPS. I thought this was weird and I remembered in another project I used std::future for a similar performance-based reason. So I then tried it with std::future using the following code:

m_Future = std::async(std::launch::async, Mainloop);
m_Future.get();

and this runs just a tiny bit below the single-threaded performance (~7900) fps. Why is std::thread so much slower than std::future?

Edit:

Disregard the above code, here is a minimal reproducable example, just toggle THREAD to be either 0 or 1 to compare:

#include <future>
#include <chrono>
#include <Windows.h>
#include <iostream>
#include <string>

#define THREAD 1

static void Function()
{
    
}

int main()
{
    std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
    std::chrono::high_resolution_clock::time_point finish = std::chrono::high_resolution_clock::now();
    long double difference = 0;
    long long unsigned int fps = 0;

#if THREAD
    std::thread worker;
#else
    std::future<void> worker;
#endif

    while (true)
    {
        //FPS 
        finish = std::chrono::high_resolution_clock::now();
        difference = std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
        difference = difference / 1000000000;
        if (difference > 0.1) {
            start = std::chrono::high_resolution_clock::now();
            std::wstring fpsStr = L"Fps: ";
            fpsStr += std::to_wstring(fps);
            SetConsoleTitle(fpsStr.c_str());
            fps = 0;
        }
        
#if THREAD
        worker = std::thread(Function);
        worker.join();
#else
        worker = std::async(std::launch::async, Function);
        worker.get();
#endif

        fps += 10;
    }

    return 0;
}

The difference here seems to be statistical noise and insignificant. Perhaps a tiny slower, but nowhere near "so much slower". — Sam Varshavchik, Apr 17 '21 at 01:34
What's your compiler and the version? For windows there maybe background thread pools for `async` — prehistoricpenguin, Apr 17 '21 at 01:34
@SamVarshavchik ?? Not comparing future with single threaded, comparing std::thread with std::future, 30 fps -> 7900 fps is extremely significant? — , Apr 17 '21 at 01:35
I thought you were comparing the 8000fps metric with the 7900fps metric. In any case, C++ threads on MS Windows are notorious for their sucky performance. Google around. On MS-Windows likely answer is that `std::async` bypasses C++-standard threads, and uses MS-Windows-specific thread code underneath `std::async` (which is NOT required to use C++ threads). — Sam Varshavchik, Apr 17 '21 at 01:41
Please create a [mre] (i.e., the simplest piece of code that reproduces the issue) so that there's sufficient context. For example, a loop of short-lived tasks would be tragic creating a thread every time, but a lot faster reusing an existing thread (which is possible for `async` to do behind the scenes). However, that's just an example because there are only two lines of code to look at. — chris, Apr 17 '21 at 01:41
Depends on the implementation. I have an app that runs 5 times faster using async and 3 times faster using threads using MSVC than a single thread. That's on a 6 core system. There's overhead in creating a thread and async uses a thread pool with less overhead. But you also need to do enough work in a thread to overcome the overhead. And you need to optimize the memory use so the threads aren't accessing each others memory. — doug, Apr 17 '21 at 01:46
I updated to include a reproducable example, @doug I guess that makes sense, however it just seems extremely odd that a bare-bones `std::thread` slows a program down by an astronomical amount (provided the function doesn't do much), yet `std::future` doesn't. — , Apr 17 '21 at 02:05
But then also, it begs the question, why would anyone ever use `std::thread`? I can't see a legitimate reason to use it, its extremely slower and requires the same amount of code in the equivalent `std::future`. They might as well just remove it from the `std` library — , Apr 17 '21 at 02:08
You'd use `std::thread` when you need a thread that stays running for an extended period of time. You'd avoid `std::thread` when you just need to run a very small/short amount of code asynchronously, since in that scenario the overhead of launching and then destroying the thread each time would outweigh the benefit of running the code asynchronously. See: https://stackoverflow.com/questions/18274217/how-long-does-thread-creation-and-termination-take-under-windows — Jeremy Friesner, Apr 17 '21 at 02:10
Regardless though, `std::async` can also run for an extended period of time, and it just seems objectively better to use this. It seems that the thread pool is beneficial to many short operations as you said, but it doesn't exclude them from running from an extended period of time? — , Apr 17 '21 at 02:15
Creating a `thread` and joining it is like telling someone to buy a new car and drive it to the store and then sell the car; whereas `async` is like telling someone to take a taxi to go to the store. You can hopefully see why the thread is slower. The taxi company manages a fleet of active cars and they certainly don't buy a new car every time someone calls for a taxi. — Wyck, Apr 17 '21 at 02:33

score 0 · Accepted Answer · answered Apr 17 '21 at 02:33

0

The std::async can be implemented in different ways. For example there can be a pre-allocated pool of threads, and each time you use the std::async in a loop you just reuse a "hot" thread from the pool.

The std::thread creates a new system thread object each time you use it. That may be a significant overhead to compare to reusing a thread from the pool.

I would advise you to test your code in a multithreaded environment where std::async may start competing for the pre-allocated system objects.

answered Apr 17 '21 at 02:33

Dmitry Kuzminov

6,180
6
18
40

1

"*The std::async can be implemented in different ways*" Not when you use the launch policy `async`. That launch policy by itself *must* mean that the function is called "as if in a new thread of execution represented by a `thread` object". And that "as if" part means `thread_local` must be properly initialized. If that's not happening, then it's not a proper implementation of the `async` launch policy. – Nicol Bolas Apr 17 '21 at 02:47
@NicolBolas I'm trying to see the relevance of your comment. Can you explain further how you feel that impacts on the answer given? – Galik Apr 17 '21 at 02:52
"as if" doesn't mean that a new thread has to be created. In theory there could be a single additional thread that each `std::async` would use: in this case only one async procedure would run at a time, the rest would wait for it's completion. – Dmitry Kuzminov Apr 17 '21 at 02:53
@NicolBolas, in addition I would recommend you to study what does *executor* would do in C++23. The executors allow you to get more control over how the complex objects like futures are execured; without the executors there may be plenty of different possible implementations that don't violate the standard. – Dmitry Kuzminov Apr 17 '21 at 03:01
@DmitryKuzminov: "*In theory there could be a single additional thread*" Then how would `thread_local` variables get re-initialized? The standard says "new thread", and that means *all* of the side-effects of having a new thread must happen. This includes `thread_local`s and their initialization. – Nicol Bolas Apr 17 '21 at 03:40
1

@Galik: "*Can you explain further how you feel that impacts on the answer given?*" It means that, for the `async` launch policy, an implementation is *not allowed* to reuse an existing thread. It must create "a new thread". And if that's not happening, then the implementation is incorrect. Note that the `deferred|async` policy does not require the creation of "a new thread", so a thread pool is a valid implementation. – Nicol Bolas Apr 17 '21 at 03:41
@NicolBolas the standard doesn't say that multiple async procedures shall be run in parallel. It doesn't say how many of them can run in parallel. The `std::async` may have a pool of threads that will be reused on each future creation. – Dmitry Kuzminov Apr 17 '21 at 03:43
Ok, I guess I have one question then. Does std::async run just as fast as an std::thread when executing code? – Apr 17 '21 at 03:43
@aksjdhjkkjanqbdkjasndkjn, there are multiple factors that you need to concern: the system calls to create the underlying objects, the synchronization, other threads running. In the simplest case yes, the CPU shall not distinguish the code that was generated for the procedure invoked from std::async from the code generated for std::thread. – Dmitry Kuzminov Apr 17 '21 at 03:51
@DmitryKuzminov: "*the standard doesn't say that multiple async procedures shall be run in parallel.*" I didn't say that it did. What it says that, if you use the `async` launch policy, the function must be executed in "a new thread". Key word: new. Not someone else's thread, a *new* thread. A thread pool is *only allowed* if you're using `deferred|async`. – Nicol Bolas Apr 17 '21 at 03:52
@NicolBolas, "in a new thread" means that is it executed in *another* thread, not the one that called the `std::async`. – Dmitry Kuzminov Apr 17 '21 at 03:54
It seems that there is then no legitimate reason to even consider using `std::thread` then. If it can make a new one or grab one from a pool it will always be equivalent to or faster than using `std::thread`. What would be the benefits of using `std::thread` over `std::async` then? – Apr 17 '21 at 03:54
@DmitryKuzminov: "*"in a new thread" means that is it executed in another thread, not the one that called the `std::async`.*" That's not what the standard actually *says*: "in a new thread of execution represented by a `thread` object". It's pretty clear that this is a *new* thread, not an existing one. If it meant "another", it would say that. – Nicol Bolas Apr 17 '21 at 03:56
@aksjdhjkkjanqbdkjasndkjn, that fully depends on the pattern of usage. I've already advised you to try to call ~100 threads in parallel and compare with ~100 async calls. – Dmitry Kuzminov Apr 17 '21 at 03:57
@aksjdhjkkjanqbdkjasndkjn: "*What would be the benefits of using std::thread over std::async then?*" Well, there's the fact that thread pools are an implementation detail and therefore implementations aren't required to provide them. If you *need* a thread pool, you should write them yourself. And doing that requires using `std::thread`. – Nicol Bolas Apr 17 '21 at 03:58
@NicolBolas, the standard says: "The function template async runs the function f asynchronously (potentially in a separate thread which might be a part of a thread pool)". The statement "as if spawned by std::thread(std::forward(f), std::forward(args)...)" doesn't mean that the `std::thread` is actually created, but claims only proper thread-locals initialization. Moreover, the actual underlying system threads may be created differently. – Dmitry Kuzminov Apr 17 '21 at 04:16
@DmitryKuzminov: "*the standard says: "The function template async runs the function f asynchronously (potentially in a separate thread which might be a part of a thread pool)".*" No, it doesn't. In fact, the words "thread pool" do not appear *anywhere* in the standard. Are you talking about cppreference or [the actual standard](https://timsong-cpp.github.io/cppwp/n4659/futures.async)? "*claims only proper thread-locals initialization*" And that's my point: you can't have a thread pool if you have to reinitialize thread-locals. – Nicol Bolas Apr 17 '21 at 04:35
@DmitryKuzminov Ok I ran it quite alot, just did some while loop counting up by 1 to 10000000 or something, using rand()%2 and if its 0, add 1, if its 1, add 1, just so release doesn't optimize it. Its the same random seed so its the same test for `std::thread` and `std::future`, ran it 100 times, took the average and `std::thread` runs 5% faster. Is there any explanation for this? – Apr 17 '21 at 04:44
@aksjdhjkkjanqbdkjasndkjn, my assumption is that `std::thread`s really run in parallel (as much as the system can allow that), while `std::async` has a thread pool with a limited number of threads to reuse. – Dmitry Kuzminov Apr 17 '21 at 04:51
@DmitryKuzminov Seems odd, though, that it would do this given the parameter `std::launch::async` should guarantee another thread of execution. Oh, to that, I wasn't testing all threads run in parallel, I was actually testing the runtime of a function on a SINGLE thread, then join it then make a new one in a for loop.\ – Apr 17 '21 at 04:53
@NicolBolas, first of all, the expenses of creating a new `std::thread` and a new system thread are different, and need to be regarded separately. Next, your link says nothing about thread-locals (whether new locals are created or old one reused and cleared). – Dmitry Kuzminov Apr 17 '21 at 04:59
*"you can't have a thread pool if you have to reinitialize thread-locals"* - I don't really see why not. For a new thread the system has to create the thread, initialize the `thred_local` and then run the passed-in function. For a thread-pool it can do all that, except creating a new thread. I can just re-initialize an old thread's `thread_local`. – Galik Apr 17 '21 at 04:59
@DmitryKuzminov: "*Next, your link says nothing about thread-locals (whether new locals are created or old one reused and cleared).*" I only linked to how `std::async` behaves. When it says "new thread", that means it does what creating a new thread does. And the standard explains that `thread_local`s are thread local variables whose values on first access by a new thread must be the value they are initialized to. That's just how it behaves. – Nicol Bolas Apr 17 '21 at 05:32
@NicolBolas, neither your link nor cppreference say "new thread", but relax the statements to "as if..." That could mean the reuse of previously created objects as well. – Dmitry Kuzminov Apr 17 '21 at 05:37
@DmitryKuzminov: The "as if" rule requires that all observable effects of creating a new thread must happen. Initializing `thread_local`s is an "observable effect" for obvious reasons, and therefore `async` launch policy must do that. Which is exactly what I said in my first comment. You can't just use a thread-pool, because those can't initialize thread-locals. – Nicol Bolas Apr 17 '21 at 05:43
Nicol is right here. The answer is wrong. std::async was created as a utility around std::thread. std::async must not use a threadpool in this case. I'm sure that in a more accurate benchmark there will be no significant difference to neither way. – David Haim Apr 17 '21 at 17:33

score 0 · Answer 2 · answered Apr 17 '21 at 15:39

In some versions of MSVC C++ standard library, std::async pulls from a (system) thread pool, while std::thread does not. This can cause problems, because I have exhausted it in the past and gotten deadlocks. It also means that casual use is faster.

My advice is to write your own thread pool on top of std::thread and use that. You'll have full control over how many threads you have active.

This is a hard problem to get right, but depending on someone else solving it doesn't work, because honestly the standard library implementations I have used does not reliably solve it.

Note that in an N-sized thread pool, a blocking dependency chain of size N will deadlock. If you make the number of threads be the number of CPUs and don't reuse the calling thread reliably, you'll find multithreaded code tested in 4+ core machines often deadlock on 2 core machines.

At the same time, if you make a thread pool for each task, and they stack, you'll end up thrashing the CPU.

Note that the standard is annoyingly vague about how many threads you can actually expect to run. While std async has to behave "as if" you made a new std thread, in practice that just means they have to reinitialize and destroy any thread_local objects.

There are eventual progress guarantees in the standard, but I have seen them violated in actual implementations when using std::async. So I now avoid using it directly.

std::thread runs A LOT slower than std::future

Edit:

2 Answers2