Converting for loop to OpenMP

Question

So I have a loop where I iterate over elements of a vector, call a function on each element, and if it meets a certain criteria, I push it onto a list.

my_list li;

for (auto itr = Obj.begin(); itr != Obj.end(); ++itr) { 
     if ((*itr).function_call()) 
         li.push_back((*itr);
}

I've been thinking of ways to optimize my program, and I came across OpenMP, but a lot of the sample code is hard to follow.

Could someone walk me through how to convert the above loop to utilize multiple cores in parallel?

Thanks.

ikikenis · Accepted Answer · 2013-11-26T07:49:25.290

1

There are a few points you need to take care to parallelize that code snippet

If you're using OpenMP 3.0 (or above) you can parallelize your for-loop #pragma omp for, if you're using an older version of OpenMP, you need to be using a for loop accessing vector with indexes.
You need to guard li.push_back((*itr); statement with a lock or set it as critical section
If function_call is not a really slow function or your vector does not contain so many items, it may not be necessary to parallelize as thread creation will introduce overhead.

So a pseudo-code implementation would be

my_list li;
 #pragma omp for
 for (auto itr = Obj.begin(); itr != Obj.end(); ++itr) { 
     if ((*itr).function_call())
     {
         #pragma omp critical CRIT_1
         {
            li.push_back((*itr);
         }
     }
 }

edited Nov 26 '13 at 07:49

answered Nov 25 '13 at 13:03

ikikenis

340
1
7

2

It is good practice to not use the nameless critical construct. You should add a name to it to make sure that the critical only protects the li data structure. Otherwise the critical might affect other critical sections that do not need to be mutually exclusive ti the code snippet. – Michael Klemm Nov 25 '13 at 16:06
I'd iterate the vector via a a pointer to the start of the data and run an index. for (T* = &obj[0], size_t i = 0, size_t size = obj.size(); ++i { } – egur Nov 25 '13 at 19:22
Great catch, thanks Michael. Edited the solution now. – ikikenis Nov 26 '13 at 07:49
That's inefficent. A better solution is to declare private vectors/lists for each thread and then fill them in parallel (without a critical section) and then mergen the private vectors/lists into the shared one in a critical section http://stackoverflow.com/questions/18669296/c-openmp-parallel-for-loop-alternatives-to-stdvector/18671256#18671256 – Z boson Dec 05 '13 at 07:28

score 0 · Answer 2 · edited May 23 '17 at 12:13

The time has come to discuss efficient ways to use container classes such as std::list or std::vector with OpenMP (since the OP wants to optimize his code using lists with OpenMP). Let me list four ways in increasing level of efficiency.

Fill the container in a parallel section in a critical block
Make private versions of the container for each thread, fill them in parallel, and then merge them in a critical section
Don't use STL containers. STL was not designed with efficiency in mind. Instead either write your own or use something like Agner Fog's containters which are designed for efficiency. For example instead of using a heap for memory allocation they use a memory pool.
In some special cases it's possible to merge the private versions of the containers in parallel as well.

Example code for the first case is given in the accepted answer. This defeats most of the purpose of using threaded code since each iteration fills the container in critical section.

Example code for the second case can be found at C++ OpenMP Parallel For Loop - Alternatives to std::vector. Rather than re-post the code here let me give an example for the third case using Agner Fog's container classes

DynamicArray<int> vec;
#pragma omp parallel
{
    DynamicArray<int> vec_private;
    #pragma omp for nowait //fill vec_private in parallel
    for(int i=0; i<100; i++) {
        vec_private.Push(i);
    }
    //merging here is probably not optimal
    //Dynamic array needs an append function
    //vec should reserve a size equal to the sum of size each vec_private
    //then use memcpy to append vec_private into vec in a critcal section
    #pragma omp critical
    {
        for(int i=0; i<vec_private.GetNum(); i++) {
            vec.Push(vec_private[i]);
        }
    }        
}

Finally, in special cases for example with histograms (probably the most common data structure in experimental particle physics), it's possible to merge the private arrays in parallel as well. For the histograms this is equivalent to an array reduction. This is a bit tricky. An example showing how to do this can be found at Fill histograms (array reduction) in parallel with OpenMP without using a critical section

Converting for loop to OpenMP

2 Answers2