I have this serial code:
std::vector<T> input, output;
//populate input
for (size_t i=0 ; i < input.size() ; i++){
output.push_back(//something using input[i]);
}
Note that it's crucial that output[i]
is computed using input[i]
. Now I want to parallelize it. A straightforward solution is:
std::vector<T> input, output;
//populate input
#pragma omp for schedule(dynamic) ordered
for (size_t i=0 ; i < input.size() ; i++){
#pragma omp ordered
output.push_back(//something using input[i]);
}
However, this could be very inefficient (ordered
requires to get a lock, which generates a big overhead if performed many times, and in addition if chunks size are small this generates even more overhead). Another possible solution could be populate each chunk separately and then merge them (in an ordered way):
std::vector<T> input, output;
//populate input
#pragma omp parallel
{
std::vector<T> myOutput;
std::vector<int> myIndexes;
#pragma omp for schedule(static,1) //NOTICE STATIC!
for (size_t i=0 ; i < input.size() ; i++){
myOutput.push_back(//something using input[i]);
myIndexes.push_back[i];
}
#pragma omp for ordered schedule(static)
for(int i=0 ; i<omp_get_num_threads() ; i++){
#pragma omp ordered
{
std::cout<<"Hello from thread "<<omp_get_thread_num()<<": ";
for(size_t j = 0; j < myIndexes.size(); j++)
std::cout<<indexes[j]<<" ";
std::cout<<std::endl;
}
}
}
Supposing that input.size()=10
and we have 8 cores. This is what I want to be printed:
Hello from thread 0: 0 1
Hello from thread 1: 2 3
Hello from thread 2: 4
Hello from thread 3: 5
Hello from thread 4: 6
Hello from thread 5: 7
Hello from thread 6: 8
Hello from thread 7: 9
But this is what it's actually printed:
Hello from thread 0: 0 8
Hello from thread 1: 1 9
Hello from thread 2: 2
Hello from thread 3: 3
Hello from thread 4: 4
Hello from thread 5: 5
Hello from thread 6: 6
Hello from thread 7: 7
The code in this question could solve my case, but this is kinda cheating (it's not actually using #pragma omp parallel for
).
Is it possible to obtain my desired behavior using only openmp?