OpenMp with fortran : why multiples DO loops are faster than workshare

Question

I don't understand why this code :

double precision :: array(200,200,100)
double precision :: array2(200,200,100)

!$OMP BARRIER
!$OMP DO SCHEDULE(static)
do z=1,100
   do y=1,200
      do x=1,200
         array2(x,y,z)=array(x,y,z)
      enddo
   enddo
enddo
!$OMP END DO NOWAIT
!$OMP BARRIER

is much faster (~25% with 32 threads compiled with ifort) than this one :

double precision :: array(200,200,100)
double precision :: array2(200,200,100)

!$OMP BARRIER
!$OMP WORKSHARE
array2=array
!$OMP END WORKSHARE NOWAIT
!$OMP BARRIER

Those two codes are suppose to do exactly the same things.

Edit : oups, I made a mistake renaming my arrays, sorry

Edit2 : Sorry I didn't search enough before posting. I found my answer here Parallelizing fortran 2008 `do concurrent` systematically, possibly with openmp

Usage of OpenMP workshare directive is currently discouraged. It turns out that at least Intel Fortran Compiler and GCC serialise FORALL statements and constructs inside OpenMP workshare directives by surrounding them with OpenMP single directive during compilation which brings no speedup whatsoever. Other compilers might implement it differently but it's better to avoid its usage if portable performance is to be achieved.

The OP sees it as a duplicate himself, but the referenced question does not mention `workshare` at all. — Vladimir F Героям слава, Feb 25 '15 at 17:24
[That question](http://stackoverflow.com/questions/17812003/parallelization-of-elementwise-matrix-multiplication) is closer to this one. — Hristo Iliev, Feb 26 '15 at 08:12
Intuitively I'd say the first version is more explicit in telling the compiler that you only want to parallelize on the outer loop, while in the second loop the compiler might decide to create more threads, which could generate an overhead . — Michel Müller, Feb 26 '15 at 10:05

OpenMp with fortran : why multiples DO loops are faster than workshare

0 Answers0