How can I run 4 threads each on a different core (parallelism)?

Question

I want to run in parallel (not concurrently)¹ four threads doing completely independent things. I'm new to parallelism and I have a couple of questions. The reason why I want to do this is because performance is really important for me. I'm working on a 4-core Windows machine and I'm using C++ in Visual Studio Community 2015.

Should I try to schedule the threads myself, so that each one runs on a different core or should I leave the OS Scheduler to do that? In my opinion I think it would be faster if I force it to run each thread on a different core. How can I do that?

This is what I have tried so far:

#include <thread>
void t1(){//do something}
void t2(){//do something}
void t3(){//do something}
void t4(){//do something}

int main(){
   std::thread thread1(t1);
   std::thread thread2(t2);
   std::thread thread3(t3);
   std::thread thread4(t4);

   t1.join();
   t2.join();
   t3.join();
   t4.join();
}

I know that join() blocks the thread until it finishes but I'm not sure if it runs the threads in parallel? Is my code executing the threads concurrently or in parallel?

¹ Note:

Concurrency is essentially when two tasks are being performed at the same time. This might mean that one is 'paused' for a short duration, while the other is being worked on.

Parallelism requires that at least two processes/tasks are actively being performed at a particular moment in time.

"concurrently or in parallel" <-- those words mean the same thing. What are you asking? — Blorgbeard, Aug 15 '16 at 20:33
You cannot choose on which core thread will be run, this is done by operating system. — Anton K, Aug 15 '16 at 20:37
@Blorgbeard See the edit and search for concurrency vs parallelism you will see what I mean. — mata, Aug 15 '16 at 20:37
You probably mean consecutively. Threads running on separate cores at the same time are running concurrently. Also, thread creation is slow and any exception in one will shut your program down. Use `std::async` with `std::launch::async` as these use a thread pool and can greatly improve execution time. — doug, Aug 15 '16 at 20:37
OS scheduler obviously is written by more experienced programmers than you. So I bet on OS scheduler. — Slava, Aug 15 '16 at 20:42
@doug concurrency is the property of operating system, parallelism is the property of CPU, these are different things. http://www.raywenderlich.com/wp-content/uploads/2014/01/Concurrency_vs_Parallelism.png — Anton K, Aug 15 '16 at 20:42
@doug Yes they are running at the same time but one can be paused, while the other is running and hen the inverse and so on. Please see the photo http://www.raywenderlich.com/wp-content/uploads/2014/03/Concurrency_vs_Parallelism.png — mata, Aug 15 '16 at 20:42
Ok, I see concurrency is a superset of parallelism which includes time slicing on one core. That portion of time when threads run simultaneously would be considered parallel and one desires to get that percentage as high as possible when the cores are available. — doug, Aug 15 '16 at 21:59

rustyx · Accepted Answer · 2016-08-16T13:45:59.820

14

You're done, no need to schedule anything. As long as there are multiple processors available, your threads will run simultaneously on available cores.

If there are less than 4 processors available, say 2, your threads will run in an interleaved manner, with up to 2 running at any given time.

p.s. it's also easy to experience it for yourself - just make 4 infinite loops and run them in 4 different threads. You will see 4 CPUs being used.

DISCLAIMER: Of course, "under the hood", scheduling is being done for you by the OS. So you depend on the quality of the scheduler built into the OS for concurrency. The fairness of the scheduler built into the OS on which a C++ application runs is outside the C++ standard, and so is not guaranteed. In reality though, especially when learning to write concurrent applications, most modern OSes will provide adequate fairness in the scheduling of threads.

edited Aug 16 '16 at 13:45

answered Aug 15 '16 at 20:42

rustyx

80,671
25
200
267

@rustyn just to be 100% sure. You are saying that in the code I wrote above I'm executing them in parallel(and not concurrently see the definition above) ? Thanks – mata Aug 15 '16 at 20:46
@mata - they will run in parallel, as long as there are 4 processors available. – rustyx Aug 15 '16 at 20:47
@rustyx Great! I have one more question about join(). Imagine 4 threads(same workload): 1st thread : 1,2 2nd:3,4 3rd: 5,6 4th:7,8 .The numbers(1,2 ...) mean task1,... If I join them 1,3,5,7 are executed att the same time(on separate cores) in one cycle, then 2,4,6,8 are executed in the second cycle. And the 4 threads finish at the same time(plus minus a couple of mili seconds) Is my example correct ? If an execution of a program takes 16s on one thread it will take around 4s on 4 threads(+ the time to create the threads) – mata Aug 15 '16 at 20:59
@mata - the order of `join` doesn't matter. In the end, you want to wait until *all* of the threads are done. If a thread is already finished when you `join` it, `join` will immediately return. p.s. If you have more questions, pls post a new question, your original one is answered, isn't it? :) – rustyx Aug 15 '16 at 21:05

score 4 · Answer 2 · answered Aug 15 '16 at 21:07

There is no standard way to set affinity of given thread, under the hood std::thread is implemented using posix threads on linux/unixes and with windows threads under Windows. The solution is to use native apis, for example under windows following code will cause full utilization of all the 8 cores of my i7 CPU:

  auto fn = []() {while (true);};
  std::vector<std::thread> at;
  const int num_of_cores = 8;
  for (int n = 0; n < num_of_cores; n++) {
    at.push_back(std::thread(fn));
    // for POSIX: use pthread_setaffinity_np
    BOOL res = SetThreadAffinityMask(at.back().native_handle(), 1u << n);
    assert(res);
  }
  for (auto& t : at) t.join();

but after commenting out SetThreadAffinityMask I still get the same results,all the cores are fully utilized, so Windows scheduler does a good job.

If you want to have a better control of the system cores look into libraries like OpenMP, TBB (Thread Building Blocks), PPL. In this order.

score -1 · Answer 3 · edited May 23 '17 at 11:45

-1

Well, you might want to set application affinity if you wish. Basically, if you have something like i7 CPU with 4cores/8threads, your app would be faster if you set affinity one thread per core (not per thread).

There is command-line tool to do so: Set affinity with start /AFFINITY command on Windows 7

Also, affinity could be set via Task Manager:http://www.windowscentral.com/assign-specific-processor-cores-apps-windows-10

edited May 23 '17 at 11:45

Community

1
1

answered Aug 15 '16 at 20:46

Severin Pappadeux

18,636
3
38
64

"your app would be faster if you set affinity one thread per core" proof? – Slava Aug 15 '16 at 21:07
1

@Slava because hyperthreading means sharing resources - caches, branch predictors etc. Less sharing - more resources to your app. Good read: http://www.agner.org/optimize/blog/read.php?i=6&v=t – Severin Pappadeux Aug 15 '16 at 21:53
Simpler and imho proper way would be to disable hyperthreading at all. Thread affinity is not for beginners and abused way to often (mostly by cool factor) and may do more harm than good. – Slava Aug 15 '16 at 21:56
@Slava Simpler- yes, proper - not sure. If you use same computer to compile/debug/test and then run, then having HT enabled certainly helps with parallel compilation: while some threads waiting for events, I/O etc, other keep working along. Whether to have affinity set internally via API or externally via batch/tm is matter of personal preference. I personally prefer batch... – Severin Pappadeux Aug 16 '16 at 00:48

How can I run 4 threads each on a different core (parallelism)?

3 Answers3

Linked