Shared vectors in OpenMP 2

Question

in my previous question Shared vectors in OpenMP it was stated that one can let diferent threads read and write on a shared vector as long as the different threads access different elements of the vector. What if different threads have to read all the (so sometimes the same) elements of a vector, like in the following case ?

#include <vector> 

int main(){

vector<double> numbers;
vector<double> results(10);
double x;

//write 25 values in vector numbers
for (int i =0; i<25; i++){
    numbers.push_back(cos(i));  
} 

#pragma omp parallel for default(none) \
shared(numbers, results) \
private(x)
    for (int j = 0;  j < 10;  j++){
        for(int k = 0; k < 25; k++){
            x += 2 * numbers[j] * numbers[k] + 5 * numbers[j * k / 25];
        }
        results[j]  =  x;     
    }

    return 0;

}

Will this parallelization be slow because only one thread at a time can read any element of the vector or is this not the case? Could I resolve the problem with the clause firstprivate(numbers)?

Would it make sense to create an array of vectors so every thread gets his own vector ?

For instance:

vector<double> numbersx[**-number of threads-**];

score 2 · Accepted Answer · answered Apr 01 '12 at 20:56

2

Reading elements of the same vector from multiple threads is not a problem. There is no synchronization in your code, so they will be accessed concurrently.

With the size of vectors that you are working with, you will not have any cache problems either, although for bigger vectors you may get some slow-downs due to the cache access pattern. In that case, separate copies of the numbers data would improve performance.

answered Apr 01 '12 at 20:56

Lubo Antonov

2,301
14
18

Thanks for your response. In fact the vector I actually want to use might have up to 67*10^6 entries (with double or float datatype). So should I make copies and how ? What do you mean with 'there is no synchronization in your code' ? – user1304680 Apr 01 '12 at 22:05
@user1304680, it sounds like you are new to multi-therading, so I would suggest reading more background information. Briefly, synchronization is a way to restrict access to a resource so that multiple threads can read it and modify it. It is a way to ensure logical correctness, but it kills parallelism. Synchronization is done through atomic operations, critical sections, mutexes. – Lubo Antonov Apr 01 '12 at 22:10
@user1304680, I would suggest NOT doing a copy to start with and verifying performance scaling. If the performance does not scale with the number of cores, then you can look at optimizing cache access. – Lubo Antonov Apr 01 '12 at 22:14
I have understood that synchronization is needed, when different threads write on the same resource (then you need to use critical etc.) In the document I was learning from however I could not find out how it will change the performance when different threads are reading the same resource. In this case there are no issues with logical correctnes, but I was not sure if the performance will be reduced. Ok, next I will see how the performance is scaling with the number of threads being used and will come back here. – user1304680 Apr 02 '12 at 12:12
So I have tried different things and came to the conclusion, that for some vectors (mostly smaller ones that get accessed quite often) in my program it was worthwile to make a copy for each thread and for others (the very big ones) it wasn't. Basically what I learned and makes the answer to my question is that in general it can make a difference in performance when different threads are reading the same elements. Here it was written that this is understandable in terms of the cache access pattern, any source where I could read something about this ? – user1304680 Apr 06 '12 at 09:19

score 1 · Answer 2 · answered Apr 01 '12 at 19:14

1

better approach:

#include <vector> 

int main(){

vector<double> numbers;
vector<double> results(10);

//write 25 values in vector numbers
for (int i =0; i<25; i++){
    numbers.push_back(cos(i));  
} 

#pragma omp parallel for
    for (int j = 0;  j < 10;  j++){
        double x = 0; // make x local var
        for(int k = 0; k < 25; k++){
            x += 2 * numbers[j] * numbers[k] + 5 * numbers[j * k / 25];
        }
        results[j]  =  x; // no race here     
    }

    return 0;

}

it will be slow kinda due to fact that there isn't much work to share

answered Apr 01 '12 at 19:14

Anycorn

50,217
42
167
261

Thank you. I can see how you declare x inside the loop, this was already discussed in http://stackoverflow.com/questions/9953905/shared-vectors-in-openmp, but else it is still the same since the vectors 'results' and 'numbers' are implicitly shared, right ? There might not be much work to share, but my question was if in general the parallelization will be slow because different threads read the same elements of the vector 'numbers'. – user1304680 Apr 01 '12 at 19:19
reading isn't likely to be a performance problem, writing may be due to false sharing (google). you can solve it by chunking the outer loop. – Anycorn Apr 01 '12 at 19:24

Shared vectors in OpenMP 2

2 Answers2