I need help understanding this openMP example

Question

I'm beginning in openMP and I want parallelize this portion of code :

for (i=0 ;i<n ;i++) 
    for (j=1 ;j<n ;j++)  
       A[i][j]+=A[i][j-1];

and I find this answer:

#pragma omp parallel for private(i, j) shared(A, n)
for (i = 0; i < n; ++i) 
  for (j = 1; j < n; ++j)  
    A[i][j] += A[i][j-1];

I have some questions:

why does i private and not shared?
about this answer if i have 4 threads so each thread executes the same code, I don't understand how can I obtain a same result as sequential code ?

How do threads achieve this work? I need your help.

score 1 · Answer 1 · answered Jan 26 '15 at 11:10

why is i private and not shared?

The variables i and j are private to each thread because they are loop counters. Each thread has to keep track of what it is doing, separately from other threads. If i or j were shared among threads, then each thread would update them and mess with the execution of other threads.

if i have 4 threads so each thread executes the same code, I don't understand how can I obtain a same result as sequential code ?

Each thread executes the same lines of code, but using different variables. OpenMP's 'for' directive helps to automagically distribute the iterations among threads, so that each thread gets a different starting value of i and a different ending value in the for loop. So it's more like each thread gets the same code inside the loop, but different loop initialization and end condition. All of them together will (if programmed correctly) give the same end result as a sequential implementation.

Of course, for this to work it is necessary for the loop to be parallelizable. If each iteration of the loop depends on results calculated by previous iterations, then it can't be run in parallel. In that case you'll need to rewrite the loop so that each iteration can be run independently.

I mean that the end result will be the same in both situations: (1) sequential implementation; (2) each thread executes a set of iterations. — Manuel M, Jan 26 '15 at 13:00
However, your example is not parallelizable and the result of running your program in parallel is undefined (think: "If I change the order of execution of the iterations, will the result change?"). An example that would work is: #pragma omp parallel for private(i, j) shared(A, n) { for (i = 0; i < n; ++i) { for (j = 1; j < n; ++j) { B[i][j] += A[j][i]; } } } This example performs a matrix transposition: B = transpose(A). — Manuel M, Jan 26 '15 at 13:03
@David Maybe you will understand what I mean if you read about [race conditions](http://stackoverflow.com/questions/34510/what-is-a-race-condition) and about [loop-carried dependences](https://engineering.purdue.edu/~smidkiff/ece495S/files/handouts/w12BW.pdf). Those are the two key concepts that you need to understand before you can write a parallel program. — Manuel M, Jan 29 '15 at 13:03

score 0 · Answer 2 · answered Jan 26 '15 at 08:34

0

I've never use omp, but your question made me curious. I hope this documentation will help you. It was helpful for me. https://computing.llnl.gov/tutorials/openMP/#Exercise1

answered Jan 26 '15 at 08:34

Neska

110
4

I need help understanding this openMP example

2 Answers2

Linked