0

I want to parallelize a for loop with many iterations using OpenPM. The results should be stored in a vector.

for (int i=0; i<n; i++)
{
    // not every iteration produces a result
    if (condition)
    {
        results.push_back (result_value);
    }
}

This the does not work properly with the #pragma omp parallel for.

So what's the best practice to achieve that?
Is it somehow possible use a separate results vector for each thread and then combining all result vectors at the end? The ordering of the results is not important.

Something like that is not practical because it consumes to much space

int *results = new int[n];
for (int i=0; i<n; i++)
{
    // not every iteration produces a result
    if (condition)
    {
        results[i] = result_value;
    }
}

// remove all unused slots in results array
flappix
  • 2,038
  • 17
  • 28

2 Answers2

3

Option 1: If each iteration takes a significant amount of time before adding the element to the vector, you can keep the push_back in a critical region:

for (int i=0; i<n; i++)
{
    // not every iteration produces a result
    if (condition)
    {
#pragma omp critical
        results.push_back (result_value);
    }
}

If threads are mostly busy with other things than the push_back, there will be little overhead from the critical region.

Option 2: If iterations are too cheap compared to the synchronization overhead, you can have each vector fill a thread-private array and then merge them at the end:

There is a good duplicate for this here and here.

Max Langhof
  • 23,383
  • 5
  • 39
  • 72
2

The "naive" way: You can init several vectors (call omp_get_max_threads() to know the thread count inside the current parallel region) then call omp_get_thread_num() inside the parallel region to know the current thread ID, and let each thread write into its vector. Then outside the parallel region merge the vectors together. This can be worth it or not, depending on how "heavy" your processing is compared to the time required to merge the vectors.

If you know the maximum final size of the vector, you can reserve it before processing (so that push_back calls won't resize the vector and you gain processing time) then call the push_back method from inside a critical section (#pragma omp critical), but critical sections are horribly slow so it's worth it only if the processing you do inside the loop is time consuming. In your case the "processing" looks to be only checking the if-clause, so it's probably not worth it.

Finally, it's a quite known problem. You should read this for more detailed information: C++ OpenMP Parallel For Loop - Alternatives to std::vector

L.C.
  • 1,098
  • 10
  • 21