Parallelizing fortran 2008 `do concurrent` systematically, possibly with openmp

Question

The fortran 2008 do concurrent construct is a do loop that tells the compiler that no iteration affect any other. It can thus be parallelized safely.

A valid example:

program main
  implicit none
  integer :: i
  integer, dimension(10) :: array
  do concurrent( i= 1: 10)
    array(i) = i
  end do
end program main

where iterations can be done in any order. You can read more about it here.

To my knowledge, gfortran does not automatically parallelize these do concurrent loops, while I remember a gfortran-diffusion-list mail about doing it (here). It justs transform them to classical do loops.

My question: Do you know a way to systematically parallelize do concurrent loops? For instance with a systematic openmp syntax?

If you are still onto it - don't use `FORALL` with `workshare`. See may updated answer below. — Hristo Iliev, Jul 25 '12 at 06:19

Hristo Iliev · Accepted Answer · 2012-07-25T11:14:56.787

It is not that easy to do it automatically. The DO CONCURRENT construct has a forall-header which means that it could accept multiple loops, index variables definition and a mask. Basically, you need to replace:

DO CONCURRENT([<type-spec> :: ]<forall-triplet-spec 1>, <forall-triplet-spec 2>, ...[, <scalar-mask-expression>])
  <block>
END DO

with:

[BLOCK
    <type-spec> :: <indexes>]

!$omp parallel do
DO <forall-triplet-spec 1>
  DO <forall-triplet-spec 2>
    ...
    [IF (<scalar-mask-expression>) THEN]
      <block>
    [END IF]
    ...
  END DO
END DO
!$omp end parallel do

[END BLOCK]

(things in square brackets are optional, based on the presence of the corresponding parts in the forall-header)

Note that this would not be as effective as parallelising one big loop with <iters 1>*<iters 2>*... independent iterations which is what DO CONCURRENT is expected to do. Note also that forall-header permits a type-spec that allows one to define loop indexes inside the header and you will need to surround the whole thing in BLOCK ... END BLOCK construct to preserve the semantics. You would also need to check if scalar-mask-expr exists at the end of the forall-header and if it does you should also put that IF ... END IF inside the innermost loop.

If you only have array assignments inside the body of the DO CONCURRENT you would could also transform it into FORALL and use the workshare OpenMP directive. It would be much easier than the above.

DO CONCURRENT <forall-header>
  <block>
END DO

would become:

!$omp parallel workshare
FORALL <forall-header>
  <block>
END FORALL
!$omp end parallel workshare

Given all the above, the only systematic way that I can think about is to systematically go through your source code, searching for DO CONCURRENT and systematically replacing it with one of the above transformed constructs based on the content of the forall-header and the loop body.

Edit: Usage of OpenMP workshare directive is currently discouraged. It turns out that at least Intel Fortran Compiler and GCC serialise FORALL statements and constructs inside OpenMP workshare directives by surrounding them with OpenMP single directive during compilation which brings no speedup whatsoever. Other compilers might implement it differently but it's better to avoid its usage if portable performance is to be achieved.

Thanks for your update. Do you have a source to read from about this discouraged behavior? — max, Jul 25 '12 at 09:58
With GCC you can look at the [source code](http://gcc.gnu.org/svn/gcc/branches/gcc-4_7-branch/gcc/fortran/). Some constructs are parallelised, e.g. array assignment, but `FORALL` is not among them. With other compilers you can look at the assembly output. — Hristo Iliev, Jul 25 '12 at 11:12
I should also add that compiler vendors are actually solving exactly the same problem that you are trying to solve :) — Hristo Iliev, Jul 25 '12 at 11:20

score 2 · Answer 2 · answered Jul 18 '12 at 21:43

2

I'm not sure what you mean "a way to systematically parallelize do concurrent loops". However, to simply parallelise an ordinary do loop with OpenMP you could just use something like:

!$omp parallel private (i)
!$omp do
do i = 1,10
    array(i) = i
end do
!$omp end do
!$omp end parallel

Is this what you are after?

answered Jul 18 '12 at 21:43

Chris

44,602
16
137
156

Sorry for the vague "systematically". As an example, could I _grep_ or _awk_ the `do concurrent`; XX; `end do` everywhere in the code, and replace it (_sed_ or _awk_, for instance) with always the same openmp syntax. Should not be occurrence specific (but the looping variable of course). Your answer may help in this way, but is it always the correct syntax for all kinds of stuff in between `do concurrent` and `end do`? – max Jul 19 '12 at 05:57
As far as I know this should be sufficient, given the restrictions on what can go into a `do concurrent` construct - hopefully someone more knowledgeable can chime in here. The one concern I have about what you are trying to do is that when using a `do concurrent` construct the compiler will check to see if what you are doing in the construct is allowed by the Fortran standard but this won't happen if you sed/awk etc. So if you make a mistake this simple translation may not be appropriate and could lead to unexpected results which may be hard to track down. – Chris Jul 19 '12 at 08:40

Parallelizing fortran 2008 `do concurrent` systematically, possibly with openmp

2 Answers2

Linked

Related