How to write to file from different threads, OpenMP, C++

Question

I use openMP for parallel my C++ program. My parallel code have very simple form

#pragma omp parallel for shared(a, b, c) private(i, result)
        for (i = 0; i < N; i++){
         result= F(a,b,c,i)//do some calculation
         cout<<i<<" "<<result<<endl;
         }

If two threads try to write into file simultaneously, the data is mixed up. How I can solve this problem?

score 9 · Accepted Answer · answered Sep 23 '17 at 15:31

OpenMP provides pragmas to help with synchronisation. #pragma omp critical allows only one thread to be executing the attached statement at any time (a mutual exclusion critical region). The #pragma omp ordered pragma ensures loop iteration threads enter the region in order.

// g++ -std=c++11 -Wall -Wextra -pedantic -fopenmp critical.cpp
#include <iostream>

int main()
{
  #pragma omp parallel for
  for (int i = 0; i < 20; ++i)
    std::cout << "unsynchronized(" << i << ") ";
  std::cout << std::endl;
  #pragma omp parallel for
  for (int i = 0; i < 20; ++i)
    #pragma omp critical
    std::cout << "critical(" << i << ") ";
  std::cout << std::endl;
  #pragma omp parallel for ordered
  for (int i = 0; i < 20; ++i)
    #pragma omp ordered
    std::cout << "ordered(" << i << ") ";
  std::cout << std::endl;
  return 0;
}

Example output (different each time in general):

unsynchronized(unsynchronized(unsynchronized(05) unsynchronized() 6unsynchronized() 1unsynchronized(7) ) unsynchronized(unsynchronized(28) ) unsynchronized(unsynchronized(93) ) unsynchronized(4) 10) unsynchronized(11) unsynchronized(12) unsynchronized(15) unsynchronized(16unsynchronized() 13unsynchronized() 17) unsynchronized(unsynchronized(18) 14unsynchronized() 19) 
critical(5) critical(0) critical(6) critical(15) critical(1) critical(10) critical(7) critical(16) critical(2) critical(8) critical(17) critical(3) critical(9) critical(18) critical(11) critical(4) critical(19) critical(12) critical(13) critical(14) 
ordered(0) ordered(1) ordered(2) ordered(3) ordered(4) ordered(5) ordered(6) ordered(7) ordered(8) ordered(9) ordered(10) ordered(11) ordered(12) ordered(13) ordered(14) ordered(15) ordered(16) ordered(17) ordered(18) ordered(19)

In case of just writing data to a file, using the last method ordered,can there be speedup ? I am asking because for me it looks if threads are exectuting the loop in order then it is equal to a serial execution right ? — AdityaG, Dec 04 '17 at 11:16
@AdityaG possibly there can still be parallel speed up, as the ordered pragma doesn't need to apply to the whole loop body, for example: `#pragma omp parallel for ordered \n for int (i = 0; i < 20; ++i) { do parallel work; #pragma omp ordered \n { do ordered serial output; } do more parallel work; }` — Claude, Dec 04 '17 at 17:57
I tested your code, for the case of just writing. I have no "do parallel work" but only a ordered statement in the for loop. What ever I do inside this ordered pragma, the parallel one is always slow (only a bit though). This approach can only be helpful if there is a good amount of work to be done in parallel compared to the work inside ordered pragma. — AdityaG, Dec 08 '17 at 11:54

score 1 · Answer 2 · answered Sep 22 '17 at 07:33

Problem is: you have a single resource all threads try to access. Those single resources must be protected against concurrent access (thread safe resources do this, too, just transparently for you; by the way: here is a nice answer about thread safety of std::cout). You could now protect this single resource e. g. with a std::mutex. Problem then is, that the threads will have to wait for the mutex until the other thread gives it back again. So you only will profit from parallelisation if F is a very complex function.

Further drawback: as threads work parallel, even with a mutex to protect std::in, the results can be printed out in arbitrary order, depending on which thread happens to operate earlier.

If I may assume that you want the results of F(... i) for smaller i before the results of greater i, you either should drop parallelisation entirely or do it differently:

Provide an array of size N and let each thread store its results there (array[i] = f(i);). Then iterate over the array in a separate non-parallel loop. Again, doing so is only worth the effort if F is a complex function (and for large N).

Additionally: Be aware that threads must be created, too, which causes some overhead somewhere (creating thread infrastructure and stack, registering thread at OS, ... – unless if you can reuse some threads already created in a thread pool earlier...). Consider this, too, when deciding if you want to parallelise or not. Sometimes, non-parallel calculations can be faster...

How to write to file from different threads, OpenMP, C++

2 Answers2