Based on this question, I have a class, where its constructor does only some assignments and then there is a build()
member function which actually does the job.
I know that the number of objects I will have to build is in the range of [2, 16]. The actual number is a user parameter.
I create my objects in a for loop like this
for (int i = 0; i < n; ++i) {
roots.push_back(RKD<DivisionSpace>(...));
}
and then in another for loop I create the threads. Every thread calls build()
in a chunk of objects, based on this logic:
If your vector has n elements and you have p threads, thread i writes only to elements
[i n / p, (i + 1) n / p).
So for example, the situation is like this:
std::vector<RKD<Foo>> foos;
// here is a for loop that pushes back 'n' objects to foos
// thread A // thread B // thread C
foos[0].build(); foos[n / 3 + 0].build(); foos[2 * n / 3 + 0].build();
foos[1].build(); foos[n / 3 + 1].build(); foos[2 * n / 3 + 1].build();
foos[2].build(); foos[n / 3 + 2].build(); foos[2 * n / 3 + 2].build();
... ... ...
The approach I followed was to determine the number of threads p
like this:
p = min(n, P)
where n
is the number of objects I want to create and P
the return value of std::thread::hardware_concurrency. After dealing with some issues that C++11 feature has, I read this:
Even when hardware_concurrency is implemented, it cannot be relied as a direct mapping to the number of cores. This is what the standard says it returns - The number of hardware thread contexts. And goes on to state - This value should only be considered to be a hint If your machine has hyperthreading enabled, it's entirely possible the value returned will be 2x the number of cores. If you want a reliable answer, you'll need to use whatever facilities your OS provides. – Praetorian
That means that I should probably change approach, since this code is meant to be executed from several users (and I mean not only in my system, many people are going to run that code). So, I would like to choose the number of threads in a way that will be both standard and efficient. Since the number of objects is relatively small, is there some rule to follow or something?