Using mpi_scatterv with 4D fortran array

Question

I'm trying to break up a 4D array over the third dimension, and send to each node using MPI. Basically, I'm computing derivatives of a matrix, Cpq, with respect to atom positions in each of the three cartesian directions. Cpq is of size nat_sl x nat_sl, so dCpqdR is of size nat_sl x nat_sl x nat x 3. At the end of the day, for ever s,i pair, I have to compute the matrix product of dCpqdR between the transpose of the eigenvectors of Cpq and the eigenvectors of Cpq like so:

temp  = MATMUL(TRANSPOSE(Cpq), MATMUL(dCpqdR(:, :, s, i), Cpq))

This is fine, but as it turns out, the loop over s and i is now by far the slow part of my code. Because each can be done independently, I was hoping that I could break up dCpqdR, and give each task it's own s, i to compute the derivative of. That is, I'd like task 1 to get dCpqdR(:,:,1,1), task 2 to get dCpqdR(:,:,1,2), etc.

I've got this working in some sense by using a buffered send/recv pair of calls. The root node allocates a temporary array, fills it, sends to the relevant nodes, and the relevant nodes do their computations as they wish. This is fine, but can be slow and memory inefficient. I'd ideally like to break it up in a more memory efficient way.

The logical thing to do, then, is to use mpi_scatterv, but here is where I start running into trouble, as I'm having trouble figuring out the memory layout for this. I've written this, so far:

    call mpi_type_create_subarray(4, (/ nat_sl, nat_sl, nat, 3 /), (/nat_sl, nat_sl, n_pairs(me_image+1), 3/),&
                                  (/0, 0, 0, 0/), mpi_order_fortran, mpi_double_precision, subarr_typ, ierr)
    call mpi_type_commit(subarr_typ, ierr)

    call mpi_scatterv(dCpqdR, n_pairs(me_image+1), f_displs, subarr_typ,&
                      my_dCpqdR, 3*nat_sl*3*nat_sl*3*n_pairs(me_image+1), subarr_typ,&
                      root_image, intra_image_comm, ierr)

I've computed n_pairs using this subroutine:

subroutine mbdvdw_para_init_int_forces()
implicit none
integer :: p, s, i, counter, k, cpu_ind
integer :: num_unique_rpq, n_pairs_per_proc, cpu
real(dp) :: Rpq(3), Rpq_norm, current_val

num_pairs = nat
if(.not.allocated(f_cpu_id))        allocate(f_cpu_id(nat, 3))
n_pairs_per_proc = floor(dble(num_pairs)/nproc_image)
cpu = 0
n_pairs = 0

counter = 1
p = 1
do counter = 0, num_pairs-1, 1
    n_pairs(modulo(counter, nproc_image)+1) = n_pairs(modulo(counter, nproc_image)+1) + 1
    end do

do s = 1, nat, 1
    f_cpu_id(s) = cpu
    if((counter.lt.num_pairs)) then
        if(p.eq.n_pairs(cpu+1)) then
            cpu = cpu + 1
            p = 0
            end if
        end if
    p = p + 1
    end do

call mp_set_displs( n_pairs, f_displs, num_pairs, nproc_image)
f_displs = f_displs*nat_sl*nat_sl*3

end subroutine mbdvdw_para_init_int_forces

and the full method for the matrix multiplication is

subroutine mbdvdw_interacting_energy(energy, forcedR, forcedh, forcedV)
    implicit none
    real(dp), intent(out) :: energy
    real(dp), dimension(nat, 3), intent(out) :: forcedR
    real(dp), dimension(3,3), intent(out) :: forcedh
    real(dp), dimension(nat), intent(out) :: forcedV
    real(dp), dimension(3*nat_sl, 3*nat_sl) :: temp
    real(dp), dimension(:,:,:,:), allocatable :: my_dCpqdR
    integer :: num_negative, i_atom, s, i, j, counter
    integer, parameter :: eigs_check    = 200
    integer :: subarr_typ, ierr

    ! lapack work variables
    integer :: LWORK, errorflag
    real(dp) :: WORK((3*nat_sl)*(3+(3*nat_sl)/2)), eigenvalues(3*nat_sl)

    call start_clock('mbd_int_energy')
    call mp_sum(Cpq, intra_image_comm)
    eigenvalues = 0.0_DP
    forcedR = 0.0_DP
    energy = 0.0_DP
    num_negative = 0
    forcedV = 0.0_DP

    errorflag=0
    LWORK=3*nat_sl*(3+(3*nat_sl)/2)
    call DSYEV('V', 'U', 3*nat_sl, Cpq, 3*nat_sl, eigenvalues, WORK, LWORK, errorflag)

    if(errorflag.eq.0) then
        do i_atom=1, 3*nat_sl, 1
            !open (unit=eigs_check, file="eigs.tmp",action="write",status="unknown",position="append")
            !    write(eigs_check, *) eigenvalues(i_atom)
            !close(eigs_check)
            if(eigenvalues(i_atom).ge.0.0_DP) then
                energy = energy + dsqrt(eigenvalues(i_atom))
            else
                num_negative = num_negative + 1
            end if
        end do

        if(num_negative.ge.1) then
            write(stdout, '(3X," WARNING: Found ", I3, " Negative Eigenvalues.")'), num_negative
        end if
        else
    end if

    energy = energy*nat/nat_sl

    !!!!!!!!!!!!!!!!!!!!
    ! Forces below here. There's going to be some long parallelization business.
    !!!!!!!!!!!!!!!!!!!!

    call start_clock('mbd_int_forces')

    if(.not.allocated(my_dCpqdR)) allocate(my_dCpqdR(nat_sl, nat_sl, n_pairs(me_image+1), 3)), my_dCpqdR = 0.0_DP



    if(mbd_vdw_forces) then
        do s=1,nat,1
            if(me_image.eq.(f_cpu_id(s)+1)) then
                do i=1,3,1
                    temp  = MATMUL(TRANSPOSE(Cpq), MATMUL(my_dCpqdR(:, :, counter, i), Cpq))
                    do j=1,3*nat_sl,1
                        if(eigenvalues(j).ge.0.0_DP) then
                            forcedR(s, i) = forcedR(s, i) + 1.0_DP/(2.0_DP*dsqrt(eigenvalues(j)))*temp(j,j)
                            end if
                        end do
                    end do
                    counter = counter + 1
                end if
            end do
        forcedR = forcedR*nat/nat_sl

        do s=1,3,1
            do i=1,3,1
                    temp  = MATMUL(TRANSPOSE(Cpq), MATMUL(dCpqdh(:, :, s, i), Cpq))
                    do j=1,3*nat_sl,1
                        if(eigenvalues(j).ge.0.0_DP) then
                            forcedh(s, i) = forcedh(s, i) + 1.0_DP/(2.0_DP*dsqrt(eigenvalues(j)))*temp(j,j)
                            end if
                        end do
                end do
            end do
        forcedh = forcedh*nat/nat_sl
        call mp_sum(forcedR, intra_image_comm)
        call mp_sum(forcedh, intra_image_comm)
    end if
    call stop_clock('mbd_int_forces')

    call stop_clock('mbd_int_energy')
    return
end subroutine mbdvdw_interacting_energy

But when run, it's complaining that

[MathBook Pro:58100] *** An error occurred in MPI_Type_create_subarray
[MathBook Pro:58100] *** reported by process [2560884737,2314885530279477248]
[MathBook Pro:58100] *** on communicator MPI_COMM_WORLD
[MathBook Pro:58100] *** MPI_ERR_ARG: invalid argument of some other kind
[MathBook Pro:58100] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[MathBook Pro:58100] ***    and potentially your MPI job)

so something is going wrong, but I have no idea what. I know my description is somewhat sparse to start with, so please let me know what information would be necessary to help.

Ideally there'd be a [working example](http://stackoverflow.com/help/mcve) so we could try to reproduce the problem. All the arguments I understand to subarray seem fine - what is `n_pairs`? Note too that you'll need one more step after defining the array type - you'll have to set the extent of the type, as well, as described [in this answer](http://stackoverflow.com/questions/17508647/sending-2d-arrays-in-fortran-with-mpi-gather/17530368#17530368). — Jonathan Dursi, Apr 05 '15 at 02:17
Make sure that `n_paris(me_image+1)` is no bigger than `nat` as is also positive (i.e. greater than zero), otherwise the type constructor will fail with `MPI_ERR_ARG`. — Hristo Iliev, Apr 05 '15 at 10:49
I've updated the question with full examples of how everything is computed. — Thomas, Apr 05 '15 at 14:26
That a huge load of code. I would simply print the values of the arguments to `mpi_type_create_subarray`. — Hristo Iliev, Apr 08 '15 at 11:28

Using mpi_scatterv with 4D fortran array

0 Answers0