Multithreading to perform a single task in C++

Question

Suppose I have this big_task() function that I can split between threads for speedup.

The way to solve that issue with multithreading is to call _beginthread() on each task of the function and then wait for all the threads to finish, right?

How do I know if that's gonna be efficient and actually benefit to minimize big_task() running time?

I also heard that multithreading efficiency depends on the platform and hardware of the client. That means that it's something I need to query in the beginning of my program as well..?

One more question, when coding in Windows, is it better to use CreateThread() rather than _beginthread()? I write cross platform applications but if CreateThread() is more efficient than I could specialize my code to use that in Windows.

Can you shed some light on the type of tasks? (networking, db-access, disk-io, cpu time?) — Stefan, Nov 19 '15 at 08:24
if you're using C++11 I believe that you can use `std::thread` instead. — default, Nov 19 '15 at 08:24
Well first thing is if you are on a multiprocessor / multicore system, of course multithreading will be faster. — Alexandre Lavoie, Nov 19 '15 at 08:25
@Stefan it's plain cpu calculations. I calculate 4x4 matrices and apply the results to data structures. — McLovin, Nov 19 '15 at 08:26
@AlexandreLavoie: only if these tasks can be parallel executed ;-) — Stefan, Nov 19 '15 at 08:26
Also, consider calculating the theoretical maximal speedup with Amdahl's law, sometimes it is surprising how a task that has some non-parallellizable part becomes not so amenable to parallellization speedup. — Erik Alapää, Nov 19 '15 at 08:29
If you want to measure the exact speed up, run it with timers ( example here: http://cboard.cprogramming.com/c-programming/112560-timers-c.html) twice - once without parallel, once with. If you want to really get stuff to work fast, try using timers to find out exactly which part takes how long, and then you can use that info for more efficient paralleling... — shapiro yaacov, Nov 19 '15 at 08:38
@shapiroyaacov But the thing is I don't know what will be the hardware of my client/s, so I must code two versions of each method, one that includes multithreading and an other that doesn't..? — McLovin, Nov 19 '15 at 08:44
I don't think you need to. Maybe an `if` somewhere to check the number of cores. no point paralleling stuff on a single core (don't know where you'd find nowadays something that is single cored, but anyway..). My suggestion was strictly for development purposes... — shapiro yaacov, Nov 19 '15 at 08:48
Is there a standard function that checks number of cores, so I know how many threads I need to create? — McLovin, Nov 19 '15 at 09:14
http://www.boost.org/doc/libs/1_59_0/doc/html/thread/thread_management.html#thread.thread_management.thread.hardware_concurrency — Domagoj Prelošćan, Nov 19 '15 at 09:23
@Pilpel If you are satisfied you should accept one of the answers. It does not need to be my answer ;-) — Domagoj Prelošćan, Nov 19 '15 at 14:39

score 5 · Answer 1 · answered Nov 19 '15 at 08:27

5

The way to solve that issue with multithreading is to call _beginthread() on each task of the function and then wait for all the threads to finish, right?

this way you will parallelize your big function, so yes thats true.

How do I know if that's gonna be efficient and actually benefit to minimize big_task() running time?

you must profile it. If your big function is executing code on CPU no I/O operations, then consider creating the same number of threads as number of cores in your CPU.

I also heard that multithreading efficiency depends on the platform and hardware of the client. That means that it's something I need to query in the beginning of my program as well..?

CPU with more cores will surely be faster than the one with fewer, you can look into PPL (win only), TBB, OpenMP libraries which makes sure tasks are run efficiently basing on number of CPU cores.

One more question, when coding in Windows, is it better to use CreateThread() rather than _beginthread()? I write cross platform applications but if CreateThread() is more efficient than I could specialize my code to use that in Windows.

if you want cross platform, then use std::thread or boost for that.

answered Nov 19 '15 at 08:27

marcinj

48,511
9
79
100

@Pilpel because its windows only function, under linux equivalent is pthreads library. std::thread will work on windows, linux, ... – marcinj Nov 19 '15 at 08:39
1

@Pilpel no, its not - its just not cross platform. For differences between CreateThread and _beginthreadex read here: http://stackoverflow.com/questions/331536/windows-threading-beginthread-vs-beginthreadex-vs-createthread-c. – marcinj Nov 19 '15 at 08:45
I see. Last question just to be sure: C++11 code will work on any platform that C++03 works on, right? – McLovin Nov 19 '15 at 09:03
@Pilpel yes, you just need a compiler that is c++11 compliant for that platform. – marcinj Nov 19 '15 at 09:07
Boost is convenient because it is multiplatform. You can build boost libraries for Linux and Windows. If you like you can have the same project in Qt Creator (with few switches in the project file) that will compile and debug on multiple platforms. There are more useful things in boost like shared_ptr and asio. Of course shared_ptr is available in C++11 nowadays. – Domagoj Prelošćan Nov 19 '15 at 09:14
Sorry, one more question :p. Is my main thread considered a "working thread" as well? This means that if I have 4 cores, should I create 3 more working threads (while the main thread is looping `Sleep()`), and not 4? – McLovin Nov 19 '15 at 09:15
It depends how much load you want to put on the system. If you make all the cores busy doing the "big_task", system will become slow. If you do not have much load in main thread, you do not need to consider it as worker. – Domagoj Prelošćan Nov 19 '15 at 09:18
I would ask the system how much cores it has and decide how many threads is appropriate. Pseudo code: N=getCores(); if (N > 1) N=N-1 (here I want to leave one core free if possible) – Domagoj Prelošćan Nov 19 '15 at 09:19
@Pilpel your main thread is not a working thread, if you have four cores then try with four threads, but verify in ie. task manager if all cores are properly utilized. – marcinj Nov 19 '15 at 09:22

Domagoj Prelošćan · Accepted Answer · 2015-11-19T09:09:12.963

I hope this will help you to get started.

To achieve speedup over single-threaded approach you need multi-core CPU. On single-core CPU additional thread will not increase the computation speed but it may help to make other functions work smooth (e.g. GUI) in the same time while doing CPU-intense work.

To utilize multi-core CPU you need to split the "big task" into chunks which can be executed in the same time and not affecting each other.

General flow:

Put the chunks into a container. Set their status to "available".
Create as many threads as you want (up to actual number of CPU cores is useful).
This is thread function. They are executed in parallel.
1. Try to grab first "available" chunk from the container and make it "busy". If no "available" chunk is found, exit thread.
2. Process the chunk and make it "ready".
3. Go back to (4)
In main thread wait for all worker threads to finish. You may wait in a loop sleeping a second each step, checking if Ctrl-C is pressed. Or simply "join" (wait until thread exits) on all the threads.
Gather all the chunks together and use the result of your computation.

Be aware that you need to take care of multiple threads accessing the same data because they may interfere with each other. For example, it is possible for multiple threads to take same chunk for processing in the same time. This problem can be solved with mutex (see boost::mutex).

There are other approaches to this problem as well. You can put your chunks into a message queue (FIFO) and let threads pop them out from the queue and put results into some other queue. If you extend this queue over the network you can employ several PCs doing the work.

For portability you can use boost::thread.

This is useful as well: boost::thread_group

Multithreading to perform a single task in C++

2 Answers2