Say I have a Fortran program that performs two tasks on an array: task A computes its mean and task B doubles it. The point is that task B should be independent from task A. When accelerating the program with OpenACC, it would make sense to run the two tasks concurrently by making task A asynchronous:
program test
implicit none
integer, parameter :: n = 1000000
real(8) :: mean
real(8) :: array(n)
real(8) :: array_d(n)
! initialize array
array = [(i, i=1, n)]
!$acc kernels async num_gangs(1)
! Task A: get mean of array
mean = 0d0
!$acc loop independent reduction(+:mean)
do i = 1, n
mean = mean + array(i)
end do
mean = mean / n
!$acc end kernels
!$acc kernels
! Task B: work on array
!$acc loop independent
do i = 1, n
array(i) = array(i) * 2
end do
!$acc end kernels
!$acc wait
!$acc end data
! print array and mean
print "(10(g0.2, x))", array(:10)
print "('mean = ', g0.2)", mean
end program
However, when running the two tasks at the same time, task B will modify the array that task A is reading, leading to incorrect values. On CPU (no acceleration) I get:
2.0 4.0 6.0 8.0 10. 12. 14. 16. 18. 20.
mean = 500000.5000000000
On GPU (using the NVIDIA HPC SDK), I get a different mean which is obviously incorrect:
2.0 4.0 6.0 8.0 10. 12. 14. 16. 18. 20.
mean = 999967.6836640000
Is there an elegant way to "protect" the array being worked by task A?