I have the following example code:
!$omp threadpriavate(var)
!$omp parallel do reduction(+:var)
do
var = var + compilated_floating_point_computation()
end do
!$omp end parallel do
print *,var
And I get slightly different results for var per run, even when I use the same number of threads. I tried to add order(reproducible:concurrent)
openmp clause but got the following compile error:
Error: threadprivate variable 'var' used in a region with 'order(concurrent)' clause
.
Is there any way to use reduction and still maintain floating point reproducibility over same number of threads?