10

I am trying to parallize a program I am using and got the following question. Will I get a loss of performance if multiple threads need to read/write on the same vector but different elements of the vector ? I have the feeling thats the reason my program hardly gets any faster upon parallizing it. Take the following code:

#include <vector> 

int main(){

    vector<double> numbers;
    vector<double> results(10);
    double x;

    //write 10 values in vector numbers
    for (int i =0; i<10; i++){
        numbers.push_back(cos(i));  
    } 

#pragma omp parallel for \
    private(x) \
    shared(numbers, results)
        for(int j = 0;  j < 10;  j++){

            x  =  2 * numbers[j]  +  5;  
#pragma omp critical  // do I need this ?
            {
                results[j]  =  x;     
            }
        }

    return 0;

}

Obviously the actual program does far more expensive operations, but this example shall only explain my question. So can the for loop be done fast and completely parallel or do the different threads have to wait for each other because only one thread at a time can access the vector number for instance although they are all reading different elements of the vector ?

Same question with the write operation: Do I need the critical pragma or is it no problem since every thread writes into a different element of the vector results ? I am happy with every help I can get and also it would be good to know if there is a better way to do this (maybe not use vectors at all, but simple arrays and pointers etc. ?) I also read vectors aren't thread safe in certain cases and it is recommended to use a pointer: OpenMP and STL vector

Thanks a lot for your help!

Community
  • 1
  • 1
user1304680
  • 700
  • 2
  • 5
  • 18

2 Answers2

9

I imagine that most of the issues with vectors in multiple threads would be if it has to resize, then it copies the entire contents of the vector into a new place in memory (a larger allocated chunk) which if you're accessing this in parallel then you just tried to read an object that has been deleted.

If you are not resizing your array, then I have had never had any trouble with concurrent read writes into the vector (obviously as long as I'm not writing twice the same element)

As for the lack of performance boost, the openmp critical section will slow your program down to probably the same as just using 1 thread (depending on how much is actually done outside that critical section)

You can remove the critical section statement (with the conditions above in mind).

SirGuy
  • 10,660
  • 2
  • 36
  • 66
  • He does not resize the vector at all. – eudoxos Mar 31 '12 at 09:22
  • @eudoxos I realize that from the code snippet, I just wanted to make sure it was mentioned, especially since he brought up the fact that STL vectors are not thread-safe under certain conditions – SirGuy Mar 31 '12 at 12:43
  • 1
    +1: it doesn't realy come up here, but you do have to keep in mind that the vector-specific operations like appending, resizing, etc are not threadsafe and would probably break. But just operating on the elements of a vector is fine as long as each element is being written by only one thread. – Jonathan Dursi Mar 31 '12 at 15:35
  • Hi, thanks for your helpful response! It was also interesting to hear about the problem with resizing etc., since I do not only want to know whats going on in my special little example, but merely want to learn about vectors and OpenMP in general. I have understood I don't have to do anything special as long as every thread writes in his own element of a vector. – user1304680 Apr 01 '12 at 19:05
5

You get no speedup precisely because of the critical sectino, which is superfluous, since the same elements will never be modified at the same time. Remove the critical section piece and it will work just fine.

You can play with the schedule strategy as well, because if memory access is not linear (it is in the example you gave), threads might fight for cache (writing elements in the same cache line). OTOH if the number of elements is given as in your case and there is no branching in the loop (therefore they will execute at about the same speed), static, which is IIRC the default, should work the best anyway.

(BTW you can declare x inside the loop to avoid private(x) and the shared directive is implied IIRC (I never used it).)

eudoxos
  • 18,545
  • 10
  • 61
  • 110
  • +1; if you're not resizing or appending to or what have you a vector, it's just a 1d array, and OpenMP is very good at operating on arrays. As for static, private, etc, I think "best practice" is to use default(none) and make everything explicitly private or shared, just to be explicit, and as @eudoxos points out, having variables defined in the scope of the parallel section makes them implicitly private and then it's a little easier to follow the code. – Jonathan Dursi Mar 31 '12 at 15:36
  • Hi thanks for your help. Is it really faster to declare x inside the loop (every thread will declare it then, right) rather than before the loop as above, or is it the same performance wise, but just nicer to look at ? What does it mean that the memory access is not linear and why is this so ? – user1304680 Apr 01 '12 at 19:08
  • @user1304680: Declaration is for the compiler, it just says that you want to have a piece of memory of some size and call it somehow in the subsequent code. I doubt makes a difference for any decent, moderately optimizing compiler. Non-linear access: I meant random access to elements from threads in adjacent memory regions. RAM then serves a chunk of memory to the CPU, which is larger than the vector item. If 2 cores access pieces of RAM nearby, it might be in the same cache line and they are then synchronized on the hw level, but it makes it slower. Don't worry about it until you need. – eudoxos Apr 02 '12 at 11:54