0

I'm beginning in openMP and i want parallelize this portion of code :

for (i=0 ;i<n ;i++) 
    for (j=1 ;j<n ;j++)  
       A[i][j]+=A[i][j-1];

and i find this answer:

#pragma omp parallel for private(i, j) shared(A, n)
for (i = 0; i < n; ++i) 
  for (j = 1; j < n; ++j)  
    A[i][j] += A[i][j-1];

i have some questions:
- why does i private and not shared?
- about this answer if i have 4 threads so,each thread have (i = 0; i < n; ++i)and (j = 0; j < n; ++j) iteration? i need your help.

  • Possible duplicate of [I need help understanding this openMP example](http://stackoverflow.com/questions/28145951/i-need-help-understanding-this-openmp-example) – starturtle Aug 19 '16 at 06:39

1 Answers1

1

1) i is private because each thread needs to iterate from 0 to n separately; otherwise, if make i shared when your threads together will iterate over ONE loop from 0 to n.

2) Yes, in this code each thread has it's own copy of i and j variables and that's why they will operate separately.

3) I am not sure in this example but I can say you must avoid data dependency because it cause a problems in making code to work parallel: processors (or workers) must do exactly one single job without dependency on another worker's state or result to bring more efficiency. See SIMD and try to find some vectorization information. Shortly, vectorization is a technique which helps much in paralellizing code because it implements SIMD paradigm. On modern cpu's like Intel Sandy Bridge and older architectures using this technique allows you to speed up very much your parallel computing by using AVX/AVX2 extensions.

VP.
  • 15,509
  • 17
  • 91
  • 161
  • [The for directive splits the for-loop so that each thread in the current team handles a different portion of the loop.](http://bisqwit.iki.fi/story/howto/openmp/) If 3 threads are used, the second thread will have `i` in the range `n/3..2n/3` and `j` in the range `1..n`. So there is no problem regarding data dependency : the `j` loop is not parallel. – francis Jan 25 '15 at 10:45
  • That's why I've said "I am not sure" :) Thanks. But this information is useful to know anyway, I'll keep it in the answer. – VP. Jan 25 '15 at 10:46
  • thank you for your interestingly answer, but i don't understand how can i obtain a true result although each thread execute the same code(in case of 4 threads)? –  Jan 25 '15 at 11:29
  • @francis, the `j` looop is not parallel so the result will be correct, however the dependency of previous iterations on `j` will cause auto-vectorization to fail. [It can be done with SIMD but it's not trivial](http://stackoverflow.com/questions/19494114/parallel-prefix-cumulative-sum-with-sse). – Z boson Jan 25 '15 at 13:21