The time has come to discuss efficient ways to use container classes such as std::list or std::vector with OpenMP (since the OP wants to optimize his code using lists with OpenMP). Let me list four ways in increasing level of efficiency.
- Fill the container in a parallel section in a critical block
- Make private versions of the container for each thread, fill them in parallel, and then merge them in a critical section
- Don't use STL containers. STL was not designed with efficiency in mind. Instead either write your own or use something like Agner Fog's containters which are designed for efficiency. For example instead of using a heap for memory allocation they use a memory pool.
- In some special cases it's possible to merge the private versions of the containers in parallel as well.
Example code for the first case is given in the accepted answer. This defeats most of the purpose of using threaded code since each iteration fills the container in critical section.
Example code for the second case can be found at C++ OpenMP Parallel For Loop - Alternatives to std::vector. Rather than re-post the code here let me give an example for the third case using Agner Fog's container classes
DynamicArray<int> vec;
#pragma omp parallel
{
DynamicArray<int> vec_private;
#pragma omp for nowait //fill vec_private in parallel
for(int i=0; i<100; i++) {
vec_private.Push(i);
}
//merging here is probably not optimal
//Dynamic array needs an append function
//vec should reserve a size equal to the sum of size each vec_private
//then use memcpy to append vec_private into vec in a critcal section
#pragma omp critical
{
for(int i=0; i<vec_private.GetNum(); i++) {
vec.Push(vec_private[i]);
}
}
}
Finally, in special cases for example with histograms (probably the most common data structure in experimental particle physics), it's possible to merge the private arrays in parallel as well. For the histograms this is equivalent to an array reduction. This is a bit tricky. An example showing how to do this can be found at Fill histograms (array reduction) in parallel with OpenMP without using a critical section