Parallelizing an inner loop with OpenMP

Question

Assume we have two nested loops. The inner loop should be parallelized, but the outer loop needs to be executed sequentially. Then the following code does what we want:

for (int i = 0; i < N; ++i) {
  #pragma omp parallel for schedule(static)
  for (int j = first(i); j < last(i); ++j) {
    // Do some work
  }
}

Now assume that each thread has to obtain some thread-local object to carry out the work in the inner loop, and that getting these thread-local objects is costly. Therefore, we don't want to do the following:

for (int i = 0; i < N; ++i) {
  #pragma omp parallel for schedule(static)
  for (int j = first(i); j < last(i); ++j) {
    ThreadLocalObject &obj = GetTLO(omp_get_thread_num()); // Costly!
    // Do some work with the help of obj
  }
}

How can I solve this issue?

Each thread should ask for its local object only once.
The inner loop should be parallelized among all threads.
The iterations of the outer loop should be executed one after the other.

My idea is the following, but does it really want I want?

#pragma omp parallel
{
  ThreadLocalObject &obj = GetTLS(omp_get_thread_num());
  for (int i = 0; i < N; ++i) {
    #pragma omp for schedule(static)
    for (int j = first(i); j < last(i); ++j) {
      // Do some work with the help of obj
    }
  }
}

@HighPerformanceMark, typos aside, I think the OPs question is interesting which is rare for OpenMP questions lately. Do you have any comments on my solution using `threadprivate`? Did I make a mistake (I almost only use C now and my C++ is super rusty)? — Z boson, Apr 15 '15 at 07:41
Your method is fine. Could you tell me why you need a thread local object initialized to the thread number? — Z boson, Apr 16 '15 at 13:33

score 1 · Answer 1 · answered Apr 19 '15 at 19:43

I don't really get why the complication of threadprivate should be necessary, when you can simply use a pool of objects. The basic idea should go along these lines:

#pragma omp parallel
{      
  // Will hold an handle to the object pool
  auto pool = shared_ptr<ObjectPool>(nullptr); 
  #pragma omp single copyprivate(pool)
  {
    // A single thread creates a pool of num_threads objects
    // Copyprivate broadcasts the handle
    pool = create_object_pool(omp_get_num_threads());
  }
  for (int i = 0; i < N; ++i) 
  {
    #pragma omp parallel for schedule(static)
    for (int j = first(i); j < last(i); ++j) 
    {
        // The object is not re-created, just a reference to it
        // is returned from the pool
        auto & r = pool.get( omp_get_thread_num() );
        // Do work with r
    }
  }
}

Parallelizing an inner loop with OpenMP

1 Answers1