2

I am trying to add OpenMP parallelization to a working code (just to a single for loop), however I cannot get rid of a segmentation fault. The problem arises from this line:

pos += sprintf(com + pos, "%d ", i);

com is a character array, and I tried defining it as char com[255] or char *com = malloc(255*sizeof(char)), both inside and before the for loop. I added private(com) to #pragma omp parallel for directive when I defined com before the loop. I also tried initializing it and using firstprivate. (pos is an integer, initialized to 0)

When I do not add -fopenmp everything works fine, but with -fopenmp it gives segfault. What am I missing?

sencer
  • 345
  • 5
  • 17

1 Answers1

5

The segmentation fault comes from multiple threads updating the value of pos at the same time, therefore setting it to some value that turns com + pos into a pointer that points beyond or before the allocated memory for com. The proper way to parallelise such a loop would be to concatenate the values in private strings and then concatenate the private strings in an ordered fashion:

char com[255];
int pos = 0;

#pragma omp parallel
{
   char mycom[255];
   int mypos = 0;

   #pragma omp for schedule(static) nowait
   for (int i = 0; i < N; i++)
      mypos += sprintf(mycom + mypos, "%d ", i);

   // Concatenate the strings in an ordered fashion
   #pragma omp for schedule(static) ordered
   for (int i = 0; i < omp_get_num_threads(); i++)
   {
      #pragma omp ordered
      pos += sprintf(com + pos, "%s", mycom);
   }
}

The ordered construct ensures proper synchronisation so one does not need critical. The use of schedule(static) is important in order to guarantee each thread processes a single contiguous section of the iteration space.

Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186
  • That's a clever solution. It would be nice if there was a way to get the chunk size and allocate `mypos` as a function of the chunk size (I can do this but it's not elegant) rather than all of `N` so not to waste memory. I mean the size of `mypos` could be roughly `N/nthreads`. Obviously, that makes no difference in this example though. – Z boson Mar 25 '15 at 12:50
  • Well, I've set the size of `mypos` to 255 just because of lazy, thus emulating `private(com)`. Since the schedule is `static` and the number of threads is known, one could compute it (kind of) easily. – Hristo Iliev Mar 25 '15 at 16:03
  • I think this is the first time I have seen `ordered` actually be useful: when filling private data in parallel but merging them in order. This avoids saving the private data for each thread and merging them in serial outside of the parallel region. I updated my answer [here](https://stackoverflow.com/questions/18669296/c-openmp-parallel-for-loop-alternatives-to-stdvector/18671256#18671256) based on your answer. Thanks! I wonder if this is possible to doe with user-defined reductions? – Z boson Mar 26 '15 at 08:58
  • You could add the `nowait` clause to the first for loop. The barrier is not necessary. – Z boson Mar 26 '15 at 10:10
  • 1
    Good catch about `nowait`. You can merge (or reduce, that is) the private data inside the parallel region by e.g. looping over a critical section in which a counter is being incremented and compared against the ID of the current thread. But indeed, the ordered loop solution is more compact and expressive. – Hristo Iliev Mar 26 '15 at 12:26