5

What's the most efficient way to return a value from a thread in C++11?

vector<thread> t(6);

for(int i = 0; i < 6; i++)
    t[i] = thread(do_c);

for(thread& t_now : t)
    t_now.join();

for(int i = 0; i < 6; i++)
    cout << /*the return of function do_c*/

Also, if the change will benefit performance, feel free to recommend another thread than std::thread.

Luka
  • 1,761
  • 2
  • 19
  • 30
  • Did you accidentally a word? Return a value from a thread, right? – ta.speot.is Jan 13 '14 at 02:11
  • ? what do you mean? yes – Luka Jan 13 '14 at 02:12
  • yes, but I want the most efficient way – Luka Jan 13 '14 at 02:13
  • I am looking for a low level wrapper not a high level one – Luka Jan 13 '14 at 02:13
  • Canonical ways to do this in C++11 are presented in the code example on cppreference: http://en.cppreference.com/w/cpp/thread/future Those are high-level, but there is no reason to assume they aren't efficient. – jogojapan Jan 13 '14 at 02:14
  • If I am launching hundreds of threads, can I get them faster than std::thread? – Luka Jan 13 '14 at 02:15
  • When you say "the most efficient CPU-wise way", you mean "the method that is most efficient CPU-wise", right? – jogojapan Jan 13 '14 at 02:15
  • So how many different ways have you discovered so far? – woolstar Jan 13 '14 at 02:15
  • boost thread, std::thread, async, future and posix thread – Luka Jan 13 '14 at 02:16
  • 1
    @luka - If you are launching 100's of threads there lies the performance problem. Having multiple threads only makes sense when you have enough processors and/or it makes sense in terms of easy program implementation. Otherwise you lose a lot of performance in context switching. Anyway joining threads do not have much of a performance overhead as the call gets blocked by the OS – Ed Heal Jan 13 '14 at 02:17
  • yes, I know this but many threads make sense when you download stuff. – Luka Jan 13 '14 at 02:17
  • That doesn't sound like those threads would be CPU-bound. – jogojapan Jan 13 '14 at 02:18
  • many threads doesn't necessarily make sense when you download stuff. it all depends on the specific scenario, and in all likelihood, it doesn't make sense. – thang Jan 13 '14 at 02:18
  • @luka - No it does not - There are other techniques. e.g. `select` – Ed Heal Jan 13 '14 at 02:19
  • If I download stuff from a server and the server responds with limited speed e.g. 100kbps but my speed is pretty high eg 10mbps, I bet 1b$ the way to download as many stuff as possible in the shortest time possible, is with 100 threads. – Luka Jan 13 '14 at 02:21
  • @luka - Using select `will` not require context switches. Besides you dot have 1b$ – Ed Heal Jan 13 '14 at 02:24
  • :P when you say select you mean I must have database access to the server, right? If that's the case, I have to database access to the server. – Luka Jan 13 '14 at 02:24
  • I understand the drawback of context switching, but I believe I have no other option here. Because the CPU is pretty important and must be used for other tasks, I am looking to do this using as little CPU as possible. – Luka Jan 13 '14 at 02:26
  • @Luka - See http://linux.die.net/man/2/select - There is an equivalent for windows – Ed Heal Jan 13 '14 at 02:26
  • 1
    somebody doesn't understand the difference between io bound and cpu bound here – ScarletAmaranth Jan 13 '14 at 02:27
  • Owh, I didn't know that about select, I will search it a bit... – Luka Jan 13 '14 at 02:28
  • i will take your bet for $1B. seems the question was asked because the poster doesn't know about many things. i/o vs cpu, socket api, etc. i would close the question before more arguments ensue. – thang Jan 13 '14 at 02:28
  • Threads are in general heavy. Having significantly more threads than processors is rarely efficient. Multiplex your downloads in one thread using `select` and use asychronous (non-blocking) io to save them out to disk. – Yakk - Adam Nevraumont Jan 13 '14 at 02:28
  • `select` blocks (i.e. gets put in a `blocked` queue). When something blocks, it will not be scheduled until it has received an event (i.e. it won't use CPU at all). – RageD Jan 13 '14 at 02:28
  • aha, this seems great... when can I learn more about those blocks? – Luka Jan 13 '14 at 02:31
  • @luka - read the manual page that i gave the link to. – Ed Heal Jan 13 '14 at 02:32
  • If I want to do this with boost::asio, how to do it? I mean, where in the boost doc is? – Luka Jan 13 '14 at 02:33
  • http://stackoverflow.com/questions/14422769/select-functionality-in-boostasio – thang Jan 13 '14 at 02:41

2 Answers2

8

First of all std::thread doesn't return a value, but the function that is passed to it on construction may very well do it.

There's no way to access the function's return value from the std::thread object unless you save it somehow after calling the function on the thread.

A simple solution would e.g. be to pass a reference to the thread and store the result in the memory pointed to by the reference. With threads though one must be careful not to introduce a data race.

Consider a simple function:

int func() {
    return 1;
}

And this example:

std::atomic<int> x{0}; // Use std::atomic to prevent data race.

std::thread t{[&x] {   // Simple lambda that captures a reference of x.
    x = func();        // Call function and assign return value.
}};

/* Do something while thread is running... */

t.join();

std::cout << "Value: " << x << std::endl;

Now, instead of dealing with this low level concurrency stuff yourself you can use the Standard Library as someone (as always) has already solved it for you. There's std::packaged_task and std::future which is designed to work with std::thread for this particular type of issue. They should also be just as efficient as the custom solution in most cases.

Here's an equivalent example using std::packaged_task and std::future:

std::packaged_task<int()> task{func}; // Create task using func.
auto future = task.get_future();      // Get the future object.

std::thread t{std::move(task)};       // std::packaged_task is move-only.

/* Do something while thread is running... */

t.join();

std::cout << "Value: " << future.get() << std::endl; // Get result atomically.

Don't always assume something is less efficient just because it is considered as "high level".

Felix Glas
  • 15,065
  • 7
  • 53
  • 82
  • The question that springs to my mind is: what *functional* requirement would justify to use a mechanism designed with parallel processing of sizeable amounts of data in mind to retrieve a mere integer thread return value? Looks more like what I call a syntax-driven software design. These kind of solutions sit pretty as C++11 versatility showcases, but they ***do*** hide considerable resource consumption under an icing of syntactic sugar, IMHO. – kuroi neko Jan 13 '14 at 05:07
  • @kuroineko It all depends on the domain i guess. For general purpose application development I would go with the standardized classes and functions for maximum understandability among coworkers, instead of custom solutions. Are there any functional requirements to justify using the abstractions? Well I can't think of any. But lacking a functional requirement to justify abstracting code for better maintainability doesn't make it less important. Non-functional requirements are also essential. – Felix Glas Jan 13 '14 at 16:03
  • As I see it, it's a matter of finding a sweet spot between ease of use and wasteful, if not potentially dangerous habits. In that specific case, I think it's worth pondering a bit about the price of comfort. I did design embedded software for a decade or so, that's probably why seeing so much resources invested to avoid so little an effort tends to make me nervous :). – kuroi neko Jan 13 '14 at 16:14
6

Lauching a thread and terminating it require many hundreds of machine cycles. But that's only a beginning. Context switches between threads, that are bound to happen if the threads are doing anything useful, will repeatedly consume even more many hundreds of machine cycles. The execution context of all these threads will consume many a byte of memory, which in turn will mess up many a line of cache, thus hindering the CPU efforts for yet another great deal of many hundreds of machine cycles.

As a matter of fact, doing anything with multitasking is a great consumer of many hundreds of machine cycles. Multitasking only becomes profitable in terms of CPU power usage when you manage to get enough processors working on lumps of data that are conceptually independent (so parallel processing won't threaten their integrity) and big enough to show a net gain compared with a monoprocessor version.

In all other cases, multitasking is inherently inefficient in all domains but one: reactivity. A task can react very quickly and precisely to an external event, that ultimately comes from some external H/W component (be it the internal clock for timers or your WiFi/Ethernet controller for network traffic).

This ability to wait for external events without wasting CPU is what increases the overall CPU efficiency. And that's it.
In terms of other performance parameters (memory consumption, time wasted inside kernel calls, etc), launching a new thread is always a net loss.

In a nutshell, the art of multitasking programming boils down to:

  • identifying the external I/O flows you will have to handle
  • taking reactivity requirements into account (remembering that more reactive = less CPU/memory efficient 99% of the time)
  • setting up handlers for the required events with a reasonable efficiency/ease of maintenance compromise.

Multiprocessor architectures are adding a new level of complexity, since any program can now be seen as a process having a number of external CPUs at hand, that could be used as additional power sources. But your problem does not seem to have anything to do with that.

A measure of multitasking efficiency will ultimately depend on the number of external events a given program is expected to cope with simultaneously and within a given set of reactivity limits.

At last I come to your particular question.

To react to external events, launching a task each time a new twig or bit of dead insect has to be moved around the anthill is a very coarse and inefficient approach.

You have many powerful synchronization tools at your disposal, that will allow you to react to a bunch of asynchronous events from within a single task context with (near) optimal efficiency at (virtually) no cost.
Typically, blocking waits on multiple inputs, like for instance the unix-flavoured select() or Microsoft's WaitForMultipleEvents() counterpart.

Using these will give you a performance boost incomparably greater than the few dozen CPU cycles you could squeeze out of this task-result-gathering-optimization project of yours.

So my answer is: don't bother with optimizing thread setup at all. It's a non-issue.

Your time would be better spent rethinking your architecture so that a handful of well thought out threads could replace the hordes of useless CPU and memory hogs your current design would spawn.

kuroi neko
  • 8,479
  • 1
  • 19
  • 43
  • Thanks for the explanation, neko-san. It made me change my mind and decide not to use threads. – Luka Jan 13 '14 at 03:42
  • @Luka glad to have contributed to spare you (and your compiler and your CPUs) a lot of unnecessary suffering ;) – kuroi neko Jan 13 '14 at 03:48
  • Sorry, I downvoted your post by mistake, please edit it to upvote it... – Luka Jan 13 '14 at 07:59