Array access in async OpenACC kernels

Question

Say I have a Fortran program that performs two tasks on an array: task A computes its mean and task B doubles it. The point is that task B should be independent from task A. When accelerating the program with OpenACC, it would make sense to run the two tasks concurrently by making task A asynchronous:

program test
    implicit none
    integer, parameter :: n = 1000000
    real(8) :: mean
    real(8) :: array(n)
    real(8) :: array_d(n)

    ! initialize array
    array = [(i, i=1, n)]

    !$acc kernels async num_gangs(1)
    ! Task A: get mean of array
    mean = 0d0
    !$acc loop independent reduction(+:mean)
    do i = 1, n
        mean = mean + array(i)
    end do
    mean = mean / n
    !$acc end kernels

    !$acc kernels
    ! Task B: work on array
    !$acc loop independent
    do i = 1, n
        array(i) = array(i) * 2
    end do
    !$acc end kernels

    !$acc wait
    !$acc end data

    ! print array and mean
    print "(10(g0.2, x))", array(:10)
    print "('mean = ', g0.2)", mean
end program

However, when running the two tasks at the same time, task B will modify the array that task A is reading, leading to incorrect values. On CPU (no acceleration) I get:

2.0 4.0 6.0 8.0 10. 12. 14. 16. 18. 20.
mean = 500000.5000000000

On GPU (using the NVIDIA HPC SDK), I get a different mean which is obviously incorrect:

2.0 4.0 6.0 8.0 10. 12. 14. 16. 18. 20.
mean = 999967.6836640000

Is there an elegant way to "protect" the array being worked by task A?

Task B is *not* independent from Task A as described - there's a (very explicit) data dependency meaning task A must be done before Task B. Independence is not just a function of the operations performed, it also depends upon the data being processed. — Ian Bush, Sep 15 '21 at 12:17
One way to break the data dependency would be to use a copy of "array" in Task A with the old values. — Mat Colgrove, Sep 15 '21 at 16:09
I see the problem and what I thought independence was. So I should copy the array, but on GPU only. How can I do that? — Neraste, Sep 16 '21 at 03:26
Sorry, if it's not just `a_copy=a` I can't help you. But note copying will probably take almost the same time as finding the average or doubling the array - thus by Amdahl's law I suspect the speed up you can obtain for doing A and B at the same time will be very limited. — Ian Bush, Sep 16 '21 at 06:36
Also note real( 8 ) is not portable, might not be supported by a compiler, and if it is might not do what you think it does. See https://stackoverflow.com/questions/838310/fortran-90-kind-parameter , personally nowadays I use the iso_fortran_env route mentioned in the comments — Ian Bush, Sep 16 '21 at 06:38

Array access in async OpenACC kernels

0 Answers0