Parreleization with C++ OpenMP and file io. Performance Issues

Question

The problem involves essentially computing some function, void lineProcess(string, string&, int[]), on a large (>20GB) data file, the computation is pretty hefty and also quite dependant on length of input line and some randomness introduced by the array parameter so I've averaged times over several test runs. First parameter one line of the file, second is address of a string so result can be outputted. Total size of the output is 3MB. No requirement for the kth line of input and output to correspond. Apart from the file io it sounds perfect for parallelising so here's the code for it.

void foo(const int param[]) {
    // process some stuff ...
    // create input stream fin, output stream fout from <iostream>
    string result;
    for (string line; getline(fin, line);) {
#pragma omp parallel task firstPrivate(result)
        lineProcess(line, result, param);
        fout << result << endl;
    }
#pragma omp task wait
    fin.close();
    fout.close();
}

I've ran it a few times on a laptop (i7 Quad core, should support 8 processes with hyper-threading) and don't seem to be seeing much speed up. Serial line process (i.e. the above minus pragama directives) averages ~2800 secs/line and parallel ~2000 secs/line. I was aiming for a figure of ~600 secs/line. I think part of the issue may be the openMP implementation, using the task and taskwait however as I don't know the number of lines in the file I couldn't see an easy way to use #pragma omp for.
Ideally I was trying for a buffer of lines read in and one of results and having all threads on processing until one buffer was nearly empty/full and then a thread swaps to refilling/emptying it by reading/writing to disk however I'm not sure if this is possible to do in OpenMP or if I could do a simple version of this with one thread solely swapping between read/write. Any advice on why this isn't as quick as expected or ways to improve performance would be appreciated. Obviously there is the fundamental limit of having to read/write a lot of data however I know the line processing takes a significant proportion of the time as well.

I found this question using a very similar method openmp - while loop for text file reading and using a pipeline the first answer matches well with my code but the second seems to be using a buffer however I'm not sure how to fully adapt it or if it's worth while.

score 1 · Answer 1 · answered Sep 26 '16 at 08:54

You should open your parallel region before the for loop. This generates one parallel region with multiple threads running. at the point where you create your tasks there are threads up and running an ready to take on your tasks.

#pragma omp parallel
{
 #pragma omp single
 {
  for(...)
  {
   #pragma omp task
   lineProcess(...)
   fout ...
  }
 }
 #pragma omp taskwait
}

Here the parallel region is opened first and then it is stated that the follwoing for is only procresed by one thread, which is generating tasks, that in turn arre worked by multiple threads. After processing all lines is finished (taskwait) the execution of your normal code can continue.

Also you shuold note that only the lineProcess function is a task. After that task is generated (not worked or finished yet) your generating thread moves on to the fout line and processes it. You can work around with it like this:

#pragma omp task
{
 lineProcess(...)
 fout ...
 fout.flush();
}

Parreleization with C++ OpenMP and file io. Performance Issues

1 Answers1