I've written a C++ app that has to process a lot of data. Using OpenMP I parallelized the processing phase quite well and, embarrassingly, found that the output writing is now the bottleneck. I decided to use a parallel for
there as well, as the order in which I output items is irrelevant; they just need to be output as coherent chunks.
Below is a simplified version of the output code, showing all the variables except for two custom iterators in the "collect data in related" loop. My question is: is this the correct and optimal way to solve this problem? I read about the barrier
pragma, do I need that?
long i, n = nrows();
#pragma omp parallel for
for (i=0; i<n; i++) {
std::vector<MyData> related;
for (size_t j=0; j < data[i].size(); j++)
related.push_back(data[i][j]);
sort(related.rbegin(), related.rend());
#pragma omp critical
{
std::cout << data[i].label << "\n";
for (size_t j=0; j<related.size(); j++)
std::cout << " " << related[j].label << "\n";
}
}
(I labeled this question c
as I imagine OpenMP is very similar in C and C++. Please correct me if I'm wrong.)