Assume we have two nested loops. The inner loop should be parallelized, but the outer loop needs to be executed sequentially. Then the following code does what we want:
for (int i = 0; i < N; ++i) {
#pragma omp parallel for schedule(static)
for (int j = first(i); j < last(i); ++j) {
// Do some work
}
}
Now assume that each thread has to obtain some thread-local object to carry out the work in the inner loop, and that getting these thread-local objects is costly. Therefore, we don't want to do the following:
for (int i = 0; i < N; ++i) {
#pragma omp parallel for schedule(static)
for (int j = first(i); j < last(i); ++j) {
ThreadLocalObject &obj = GetTLO(omp_get_thread_num()); // Costly!
// Do some work with the help of obj
}
}
How can I solve this issue?
Each thread should ask for its local object only once.
The inner loop should be parallelized among all threads.
The iterations of the outer loop should be executed one after the other.
My idea is the following, but does it really want I want?
#pragma omp parallel
{
ThreadLocalObject &obj = GetTLS(omp_get_thread_num());
for (int i = 0; i < N; ++i) {
#pragma omp for schedule(static)
for (int j = first(i); j < last(i); ++j) {
// Do some work with the help of obj
}
}
}