OpenMP unequal load without for loop

Question

I have an OpenMP code that looks like the following

while(counter < MAX)  {
  #pragma omp parallel reduction(+:counter) 
  {
     // do monte carlo stuff
     // if a certain condition is met, counter is incremented

  }
}

Hence, the idea is that the parallel section gets executed by the available threads as long as the counter is below a certain value. Depending on the scenario (I am doing MC stuff here, so it is random), the computations might take long than others, so that there is an imbalance between the workers here which becomes apparent because of the implicit barrier at the end of the parallel section.

It seems like #pragma omp parallel for might have ways to circumvent this (i.e. nowait directive/dynamic scheduling), but I can't use this, as I don't know an upper iteration number for the for loop.

Any ideas/design patterns how to deal with such a situation?

Best regards!

Zulan · Accepted Answer · 2017-11-21T15:17:39.007

1

Run everything in a single parallel section and access the counter atomically.

int counter = 0;
#pragma omp parallel
while(1) {
     int local_counter;
     #pragma omp atomic read
     local_counter = counter;
     if (local_counter >= MAX) {
          break;
     }
     // do monte carlo stuff
     // if a certain condition is met, counter is incremented
     if (certain_condition) {
         #pragma omp atomic update
         counter++;
     }
}

You can't check directly in the while condition, because of the atomic access. Note that this code will overshoot, i.e. counter > MAX is possible after the parallel section. Keep in mind that counter is shared and read/updated by many threads.

edited Nov 21 '17 at 15:17

answered Nov 20 '17 at 20:42

Zulan

21,896
6
49
109

Thanks you - I will try it out immediately. I guess the reduction clause is then no longer needed right? I mean, using the atomic operations, the code is already taking care of this, right? – tomseidel1 Nov 20 '17 at 21:32
Yes the reduction clause is wrong in this case - sorry and good catch. Atomics will take care that 1) `counter` is updated the correct amount of times total and 2) you always read a valid value. – Zulan Nov 20 '17 at 21:39
One more general question: What's the best way to handle a situation where the threads have to access a complicated struct (by this I mean for example, members that are dynamically allocated arrays) in a readonly manner? My current approach was simply shared and I did not notice any issues. However, I am wondering: Why would I need the #pragma atomic read then? Would firstprivate be more appropriate then for the struct? – tomseidel1 Nov 21 '17 at 19:24
If there are only read accesses from all threads, it can and should be shared without atomic accesses. The read access in my example is necessary because other threads do write at the same time. Note: declaring a struct `firstprivate` will not affect through pointers / dynamically allocated memory. – Zulan Nov 21 '17 at 20:39
Thanks, this is really useful. I wonder why it was so hard to find a good while loop example using openmp. – thc Dec 12 '18 at 23:30

OpenMP unequal load without for loop

1 Answers1

Linked