It is not that easy to do it automatically. The DO CONCURRENT
construct has a forall-header which means that it could accept multiple loops, index variables definition and a mask. Basically, you need to replace:
DO CONCURRENT([<type-spec> :: ]<forall-triplet-spec 1>, <forall-triplet-spec 2>, ...[, <scalar-mask-expression>])
<block>
END DO
with:
[BLOCK
<type-spec> :: <indexes>]
!$omp parallel do
DO <forall-triplet-spec 1>
DO <forall-triplet-spec 2>
...
[IF (<scalar-mask-expression>) THEN]
<block>
[END IF]
...
END DO
END DO
!$omp end parallel do
[END BLOCK]
(things in square brackets are optional, based on the presence of the corresponding parts in the forall-header)
Note that this would not be as effective as parallelising one big loop with <iters 1>*<iters 2>*...
independent iterations which is what DO CONCURRENT
is expected to do. Note also that forall-header permits a type-spec that allows one to define loop indexes inside the header and you will need to surround the whole thing in BLOCK ... END BLOCK
construct to preserve the semantics. You would also need to check if scalar-mask-expr exists at the end of the forall-header and if it does you should also put that IF ... END IF
inside the innermost loop.
If you only have array assignments inside the body of the DO CONCURRENT
you would could also transform it into FORALL
and use the workshare
OpenMP directive. It would be much easier than the above.
DO CONCURRENT <forall-header>
<block>
END DO
would become:
!$omp parallel workshare
FORALL <forall-header>
<block>
END FORALL
!$omp end parallel workshare
Given all the above, the only systematic way that I can think about is to systematically go through your source code, searching for DO CONCURRENT
and systematically replacing it with one of the above transformed constructs based on the content of the forall-header and the loop body.
Edit: Usage of OpenMP workshare
directive is currently discouraged. It turns out that at least Intel Fortran Compiler and GCC serialise FORALL
statements and constructs inside OpenMP workshare
directives by surrounding them with OpenMP single
directive during compilation which brings no speedup whatsoever. Other compilers might implement it differently but it's better to avoid its usage if portable performance is to be achieved.