1

I'm trying to write a parallel code for a function that has the following scheme, using OpenMP:

1. Begin of data-dependent loop
    2. Some computation
    3. If the result of 2 equals 0 then
        3.1. Begin of data-independent loop
        3.2. Some computation
        3.3. End of data-independent loop
    4. Some computation by a single thread
    5. Begin of data-independent loop
        6. Some computation
    7. End of data-independent loop
8. End of data-dependent loop    

The issue is - I'd like to enclose the regions with something like:

#pragma omp parallel
1. Begin of data-dependent loop
    #pragma omp master
    2. Some computation by a single thread
    3. If the result of 2 equals 0 then
        #pragma omp for
        3.1. Begin of data-independent loop
        3.2. Some computation
        3.3. End of data-independent loop
    4. Some computation by a single thread
    #pragma omp for
    5. Begin of data-independent loop
        6. Some computation
    7. End of data-independent loop
8. End of data-dependent loop

However, the compiler does not allow me to have the pragma omp for nested with pragma omp master. Is there any solution to that, besides changing them to pragma omp parallel for and giving up the fork outside the main loop?

Let me know if it isn't clear enough.

Thanks in advance

gcolucci
  • 438
  • 1
  • 5
  • 21

1 Answers1

1

Yes, just parallelize the inner loops like this

for(int i=0; i<n; i++) {
   cut = foo(i);
   if(!cut) {
       #pragma omp parallel for
       for(int j=0; j<m; j++) {
           //
       }
   }
   foo2();
   #pragma omp parallel for
   for(int j=0; j<k; j++) {
       //
   }
}

This is efficient because implementations of OpenMP create a pool of threads the first time it's called which are available for the next parallel region, i.e. the threads are not created and destroyed between parallel regions. This is one of the nice features of using OpenMP in my opinion. It's pretty easy to create a toy OpenMP model using e.g. pthreads and implementing static scheduling but creating a pool of threads is more difficult. Note that there is nothing requiring OpenMP to create a pool but every implementation of it I have used does it.

See cholesky-decomposition-with-openmp for an example parallelizing the inner loop.

Community
  • 1
  • 1
Z boson
  • 32,619
  • 11
  • 123
  • 226