5

I have a code that runs many iterations and only if a condition is met, the result of the iteration is saved. This is naturally expressed as a while loop. I am attempting to make the code run in parallel, since each realisation is independent. So I have this:

while(nit<avit){
    #pragma omp parallel shared(nit,avit)
    {
        //do some stuff
        if(condition){
            #pragma omp critical
            {
                nit++;
                \\save results
            }
        }
    }//implicit barrier here
}

and this works fine... but there is a barrier after each realization, which means that if the stuff I am doing inside the parallel block takes longer in one iteration than the others, all my threads are waiting for it to finish, instead of continuing with the next iteration.

Is there a way to avoid this barrier so that the threads keep working? I am averaging thousands of iterations, so a few more don't hurt (in case the nit variable has not been incremented in already running threads)...

I have tried to turn this into a parallel for, but the automatic increment in the for loop makes the nit variable go wild. This is my attempt:

#pragma omp parallel shared(nit,avit)
{
    #pragma omp for
    for(nit=0;nit<avit;nit++){
        //do some stuff
        if(condition){
            \\save results
        } else {
            #pragma omp critical
            {
                nit--;
            }
        }
    }
}

and it keeps working and going around the for loop, as expected, but my nit variable takes unpredictable values... as one could expect from the increase and decrease of it by different threads at different times.

I have also tried leaving the increment in the for loop blank, but it doesn't compile, or trying to trick my code to have no increment in the for loop, like

...
incr=0;
for(nit=0;nit<avit;nit+=incr)
...

but then my code crashes...

Any ideas?

Thanks

Edit: Here's a working minimal example of the code on a while loop:

#include <random>
#include <vector>
#include <iostream>
#include <time.h>
#include <omp.h>
#include <stdlib.h>
#include <unistd.h>

using namespace std;

int main(){

    int nit,dit,avit=100,t,j,tmax=100,jmax=10;
    vector<double> Res(10),avRes(10);

    nit=0; dit=0;
    while(nit<avit){
        #pragma omp parallel shared(tmax,nit,jmax,avRes,avit,dit) private(t,j) firstprivate(Res)
        {
            srand(int(time(NULL)) ^ omp_get_thread_num());
            t=0; j=0;
            while(t<tmax&&j<jmax){
                Res[j]=rand() % 10;
                t+=Res[j];
                if(omp_get_thread_num()==5){
                    usleep(100000);
                }
                j++;
            }
            if(t<tmax){
                #pragma omp critical
                {
                    nit++;
                    for(j=0;j<jmax;j++){
                        avRes[j]+=Res[j];
                    }
                    for(j=0;j<jmax;j++){
                        cout<<avRes[j]/nit<<"\t";
                    }
                    cout<<" \t nit="<<nit<<"\t thread: "<<omp_get_thread_num();
                    cout<<endl;
                }
            } else{
                #pragma omp critical
                {
                    dit++;
                    cout<<"Discarded: "<<dit<<"\r"<<flush;
                }
            }
        }
    }
    return 0;
}

I added the usleep part to simulate one thread taking longer than the others. If you run the program, all threads have to wait for thread 5 to finish, and then they start the next run. what I am trying to do is precisely to avoid such wait, i.e. I'd like the other threads to pick the next iteration without waiting for 5 to finish.

  • It's not really possible to answer the question properly without a more specific understanding of *some stuff*. Particularly we have no Idea whether `nit` is accessed in *some stuff* and what is supposed to happen when multiple threads have `condition` at the same time, or `nit` was updated multiple times while one thread was doing *some stuff*... It's hard, but for a good, specific, answer you have to create a [mcve]. – Zulan Nov 18 '17 at 10:13
  • Thanks for the comment @Zulan. I edited the question to add a minimal working example at the end. Hope this clarifies things. – Francisco Herrerias-Azcue Nov 21 '17 at 11:12

1 Answers1

4

You can basically follow the same concept as for this question, with a slight variation to ensure that avRes is not written to in parallel:

int nit = 0;
#pragma omp parallel
while(1) {
     int local_nit;
     #pragma omp atomic read
     local_nit = nit;
     if (local_nit >= avit) {
          break;
     }

     [...]

     if (...) { 
          #pragma omp critical
          {
                #pragma omp atomic capture
                local_nit = ++nit;
                for(j=0;j<jmax;j++){
                    avRes[j] += Res[j];
                } 
                for(j=0;j<jmax;j++){
                    // technically you could also use `nit` directly since
                    // now `nit` is only modified within this critical section
                    cout<<avRes[j]/local_nit<<"\t";
                }
          }
     } else {
          #pragma omp atomic update
          dit++;
     }
 }

It also works with critical regions, but atomics are more efficient.

There's another thing you need to consider, rand() should not be used in parallel contexts. See this question. For C++, use a private (i.e. defined within the parallel region) random number generator from <random>.

Zulan
  • 21,896
  • 6
  • 49
  • 109
  • Thanks @Zulan, with minor modifications this seems to be (almost) doing the trick! I am not very familiar with `atomic`, but it would not let me add a block of code as with `critical`... I need that all threads are able to write results, and that they are written more often than just at the end... If I do `#pragma omp critical { local_nit = nit++; for(j=0;j – Francisco Herrerias-Azcue Nov 23 '17 at 08:43
  • Never mind, I understand now that `atomic` can only be used for single line instructions. So the final code has to have the `critical` section for writing. And just as a final correction, I believe the `local_nit = nit++;` should simply be `nit++;`, since `local_nit` is updated above, in `#pragma omp atomic read local_nit = nit;` and the `if` that prevents writing until the end is not wanted... Thanks again for your help. – Francisco Herrerias-Azcue Nov 23 '17 at 10:28
  • My code was mistakenly using postfix instead of prefix increment. I fixed that and the link. With the correct form `local_nit = ++nit; if (local_nit == avit) ...` you can protect the writing of output without an expensive critical section. – Zulan Nov 23 '17 at 13:42
  • That makes a bit more sense... but this would again only write at the end of the program, right? I need to be able to check results more often, so I think I do need a critical section... Could I "hide it" within an `if`?: `#pragma omp atomic capture local_wnit=++wnit; if(local_wnit==wavit){ #pragma omp critical { \\write; wnit-=local_wnit;} }` where `wnit` is a global write counter `local_wnit` the locally handled variable and `wavit` a shared variable with the number of iterations between write events. Would this help or is it the same? Thanks for fixing the link. – Francisco Herrerias-Azcue Nov 24 '17 at 08:48
  • If you need to update the result for every increment (sorry I misread your question), you might as well just use a critical section for the update. See my edited answer. Alternative you can do a reduction if it's just about summing up if performance is an issue with the critical section - or if `jmax` is rather small do every result update atomically. Most importantly you don't want to read the counter in a critical section. – Zulan Nov 24 '17 at 10:00
  • Thanks @Zulan. Why is it bad to read the counter in the critical section? I need to output `avRes/nit`, so would it be better to do `#pragma omp atomic update local_nit=++nit; #pragma omp critical { for(j=0;j – Francisco Herrerias-Azcue Nov 27 '17 at 10:03
  • The counter is read on **every** iteration and a critical section is significantly more expensive than a atomic read. Yes, it would be better to use a local copy of nit for the average. However, taking that before the critical region is dangerous because then an outdated nit may be used. Instead move the `++nit` inside the critical region. Then it doesn't matter whether you use `nit` or `local_nit`. Check my update. Didn't think the output was very important for your example. – Zulan Nov 27 '17 at 11:40
  • Ok great, that matches my code. Thanks a lot @Zulan. – Francisco Herrerias-Azcue Nov 28 '17 at 15:11