I have this sequential code:
for (unsigned item = 0; item < totalItems; ++item) { // Outer loop
// Outer body
for (unsigned j = 0; j < maxSize; ++j) { // Inner loop
// Inner body
}
}
My goal is to simply parallelize the inner loop. It could be done like this:
for (unsigned item = 0; item < totalItems; ++item) { // Outer loop
// Outer body
#pragma omp parallel for
for (unsigned j = 0; j < maxSize; ++j) { // Inner loop
// Inner body
}
}
The problem of this code is that on every run of the outer loop new threads are spawned. In order to speed up this code, I want to create a team of threads in advance and used them multiple times. I found that for this purpose there is a directive #pragma omp for
.
#pragma omp parallel
for (unsigned item = 0; item < totalItems; ++item) { // Outer loop
// Outer body
#pragma omp for
for (unsigned j = 0; j < maxSize; ++j) { // Inner loop
// Inner body
}
}
However, if I understand it correctly the usage of the directive #pragma omp parallel
leads to the fact that outer loop is run multiple time. Is this correct?
Edit: Here a more detailed example:
// Let say that the image is represented as an array of pixels
// where pixels is just one integer.
std::vector<Image> images = getImages();
for (auto & image : images) { // Loop over all images
#pragma omp parallel for
for (unsigned j = 0; j < image.size(); ++j) { // Loop over each pixel
image.at(j) += addMagicConstant(j);
}
}
Goal: I want to spawn a team of threads and then used them repeatedly to parallelize only the inner loop (= loop over the image pixels).