1

Based on this question, I have a class, where its constructor does only some assignments and then there is a build() member function which actually does the job.

I know that the number of objects I will have to build is in the range of [2, 16]. The actual number is a user parameter.

I create my objects in a for loop like this

for (int i = 0; i < n; ++i) {
  roots.push_back(RKD<DivisionSpace>(...));
}

and then in another for loop I create the threads. Every thread calls build() in a chunk of objects, based on this logic:

If your vector has n elements and you have p threads, thread i writes only to elements

[i n / p, (i + 1) n / p).

So for example, the situation is like this:

std::vector<RKD<Foo>> foos;
// here is a for loop that pushes back 'n' objects to foos

// thread A         // thread B                 // thread C
foos[0].build();    foos[n / 3 + 0].build();    foos[2 * n / 3 + 0].build();
foos[1].build();    foos[n / 3 + 1].build();    foos[2 * n / 3 + 1].build();
foos[2].build();    foos[n / 3 + 2].build();    foos[2 * n / 3 + 2].build();
...                 ...                         ...

The approach I followed was to determine the number of threads p like this:

p = min(n, P) 

where n is the number of objects I want to create and P the return value of std::thread::hardware_concurrency. After dealing with some issues that C++11 feature has, I read this:

Even when hardware_concurrency is implemented, it cannot be relied as a direct mapping to the number of cores. This is what the standard says it returns - The number of hardware thread contexts. And goes on to state - This value should only be considered to be a hint If your machine has hyperthreading enabled, it's entirely possible the value returned will be 2x the number of cores. If you want a reliable answer, you'll need to use whatever facilities your OS provides. – Praetorian

That means that I should probably change approach, since this code is meant to be executed from several users (and I mean not only in my system, many people are going to run that code). So, I would like to choose the number of threads in a way that will be both standard and efficient. Since the number of objects is relatively small, is there some rule to follow or something?

Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • I don't think the quote you provided means you need to change the approach; it only says that the value may be the number of *logical* (or *virtual*) cores, which is fine - you *want* to take advantage of hyperthreading if it's enabled. There's no point in using half the number of threads if your processor can run twice as many in parallel. – bogdan Jan 15 '15 at 21:58
  • @bogdan if you could make an answer analysing your point, that would be great, since it's not quite clear for me now :/ – gsamaras Jan 15 '15 at 22:00
  • OpenMP defines [a number of environment variables](https://gcc.gnu.org/onlinedocs/libgomp/Environment-Variables.html) that can, among other things, be used to select the maximum number of threads it will create. You can make your application check similarly named variables and if they are not set by the user, fall back to `std::thread::hardware_concurrency`. That's what I would probably do. – 5gon12eder Jan 15 '15 at 22:02
  • @bogdan see my edit! Could you make an example or something straightforward 5gon12eder? – gsamaras Jan 15 '15 at 22:04
  • Correct @bogdan, sorry for the confusion. :/ – gsamaras Jan 15 '15 at 22:06
  • OK, then I think my first comment stands. Are you worried about hyperthreading, or something else? – bogdan Jan 15 '15 at 22:16
  • I only worry for the number of threads I am going to use @bogdan. :) – gsamaras Jan 15 '15 at 22:34

1 Answers1

3

Just pick a thread pool of hardware_concurrency threads and queue the items on a first come, first served basis.

If other processes in the system somehow get priority from the OS, so be it. This simply means that fewer than the allocated pool size (e.g. P - 1) can run simultaneously. It doesn't matter since the first available pool thread that is done build()-ing one item will pick the next item from the queue.

To really avoid threads competing over the same core, you could

  • use a semaphore (interprocess semaphore if you want to actually coordinate the builder threads from separate processes)

  • thread affinity (to prevent the OS from scheduling a particular thread onto a different core the next time slice); sadly I don't think there is standard, platform-independent, way to set thread affinity (yet).

I see no compelling reason to make it more complicated

sehe
  • 374,641
  • 47
  • 450
  • 633
  • "Just pick a thread pool of hardware_concurrency threads". You mean a pool of `std::thread`s with size equal to the return value of hardware_concurrency? – gsamaras Jan 15 '15 at 22:38
  • Yes. See e.g. here http://stackoverflow.com/questions/22569805/boost-thread-throwing-exception-thread-resource-error-resource-temporarily-una/22570554#22570554 (the "Crazy Friday bonus") – sehe Jan 15 '15 at 22:39
  • The example uses boost, which I want to avoid, but I get the concept. So every thread takes just one element, not more than one like in my question above, right? – gsamaras Jan 15 '15 at 22:46
  • Every thread takes from the queue until the queue is empty, is my idea. One at a time. – sehe Jan 15 '15 at 23:31
  • I see. The final part is that I do not understand why they need to compete each other. – gsamaras Jan 15 '15 at 23:40
  • They don't. But it might happen because their might be more processes active on the same machine (as you suggested in the question). When the OS decides to give a threads from another process execution time, threads from your process will have fewer cores effectively available. Because the OS will (by default) try to get fair scheduling, the OS would start running different threads on the same logical core, in alternation. This is what I mean with: the threads are competing for a core. (Thread affinity can avoid this; proper synchronization with e.g. a semaphore can alleviate this) – sehe Jan 15 '15 at 23:43
  • @G.Samaras It's Crazy Friday again: [c++11 version of the **`thread_pool`** example](http://coliru.stacked-crooked.com/a/47cb1af684fc62db). That's: no boost at all – sehe Jan 16 '15 at 00:16
  • I see now! Wow thanks! May I ask what's Crazy Friday? You mentioned it in the other example too. :) You got my +1 again and I will accept your answer shortly in the future. I will also give a +1 to your boost answer as well. – gsamaras Jan 16 '15 at 08:26
  • @G.Samaras "Crazy Friday" is what I jokingly said back then because I went a bit overboard with the answer :) And today happens to be Friday, so... it still applies to the the no-boost version too – sehe Jan 16 '15 at 08:28
  • If you really think you're cleverer than the people writing the OS scheduler you'll be wrong in the vast majority of cases. Screwing with thread affinity doesn't help at all with threads of other processes and the claim that modern schedulers will move threads from one core to the next for no good reason would warrant some proof - avoiding doing so is not particularly fancy. – Voo Jan 16 '15 at 11:27