0

I have a simple dissemination barrier implemented in OpenMP that has a potential deadlock and I have not been able to figure out why.

The threads share a common data structure flags which looks something like this:

  // each other.
  typedef struct Flags{
    int myflags[2][MAX_THREADS];
    int *partnerFlags[2][MAX_THREADS];
  }Flags;
    #pragma omp parallel shared(processors, rounds, totalTime)
    {
        int parity = 0;
        int sense = 1;
        int threadID = omp_get_thread_num();
        int i;
        Flags *localFlags = &processors[threadID];
        
        for(i=0; i<MAX_BARRIERS; i++) {
            double startTime = omp_get_wtime();
            disseminationBarrier(localFlags, &sense, &parity, &rounds);
            double endTime = omp_get_wtime();

            double elapsedTime = (endTime - startTime);

            printf("Time spent at the barrier(in ms) %d by thread %d is %lf\n", i, threadID, (elapsedTime / 1000.0));            

            #pragma omp critical
            {
                totalTime = totalTime + elapsedTime;
            }
        }
    }

The code for the dissemination function looks something like this

 void disseminationBarrier(Flags *localFlags, int *sense, int *parity, int *rounds){
    int i;
    for(i=0; i<*rounds; i++) {
        *localFlags->partnerFlags[*parity][i] = *sense;
        while(localFlags->myflags[*parity][i] != *sense);  -> Point of probable deadlock
    }

    if(*parity) {
        *sense = !*sense;
    }
    
    *parity = 1- *parity;
}
Anonymous
  • 33
  • 4
  • I'm not sure I understand the idea, but since there's no synchronization point, why would you expect `localFlags->myflags[*parity][i]` to suddenly change of value? The compiler can even just optimize this out and block there on an infinite loop. So you need to either declare some variable `volatile` to avoid the caching and force the actual reading of the value, or force the flushing of the cache with a `#pragma omp flush` (very error prone). But again, I might have missed the point here – Gilles Mar 12 '21 at 06:28
  • You will need the `flush` directive, as OpenMP's memory model does not require the compiler to read the actual value of the memory location in the while loop. BTW, `volatile` does not fix this, as `volatile` only means that the compiler has to put the read operation (in your case), but it does not say anything about the ordering of the memory operations. With `volatile` that code may work on some architecture and it may horribly on others. – Michael Klemm Mar 12 '21 at 08:21
  • Another option for you is to use the `atomic` directive to do the reads and writes to your barrier state. There you can also add a memory ordering request that would then force the compiler to do the "right" thing. If your code is C++, then you can also do this with `std::atomic`. Bear in mind that the barrier also needs to include the proper memory fences, such that no memory operation from before can float to after the barrier completion and vice versa. – Michael Klemm Mar 12 '21 at 08:23
  • If you want to look at a Barrier Zoo (which includes a dissemination barrier) you can find one in the "Little OpenMP" (LOMP) library, along with a barrier benchmark. https://github.com/parallel-runtimes/lomp/ – Jim Cownie Mar 15 '21 at 11:34

0 Answers0