Following @Jonathan's comment, I am trying to reframe my question. I have a five-dim array which is calculated in parts at various processors. I need to allgather (in the MPI terminology) the elements at all the processors after the chunks have been calculated.
Following Jonathan's answer (https://stackoverflow.com/a/17530368/2895678) to a similar question on a 2D array I have built my test program, which successfully scatters (via MPI_Scatterv
) the chunks. The code is
program scatter
use mpi
implicit none
integer, dimension(2,2,2,6,2) :: gpsi
integer, dimension(2,2,2,2,2) :: local
integer, dimension(3) :: displs,counts
integer :: ierr,rank,comsize
integer :: p,newtype,i,j,k,l,m,intsize,resizedtype
integer, dimension(5) :: totsize,subsize,starts
integer, dimension(MPI_STATUS_SIZE) :: rstatus
integer(kind=MPI_ADDRESS_KIND) :: extent, begin
call MPI_Init(ierr)
call MPI_Comm_size(MPI_COMM_WORLD, comsize, ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
gpsi = 0.0
if(rank == 0) then
call f(gpsi)
do m = 1,2
do l = 1,6
do k = 1,2
do j = 1,2
do i = 1,2
print*,i,j,k,l,m,gpsi(i,j,k,l,m)
enddo
enddo
enddo
enddo
enddo
endif
totsize = [2,2,2,6,2]
subsize = [2,2,2,2,2]
starts = [0,0,0,0,0]
call MPI_type_create_subarray(5,totsize,subsize,starts, &
MPI_ORDER_FORTRAN, MPI_INTEGER, &
newtype,ierr)
call MPI_type_size(MPI_INTEGER,intsize,ierr)
extent = intsize
begin = 0
call MPI_type_create_resized(newtype,begin,extent,resizedtype,ierr)
call MPI_type_commit(resizedtype,ierr)
counts = 1
displs = [0,16,32]
call MPI_scatterv(gpsi,counts,displs,resizedtype,local,32, &
MPI_INTEGER,0,MPI_COMM_WORLD,ierr)
do p = 1,comsize
if(rank == p-1) then
print*,'Rank =',rank
print*,local(:,:,:,:,:)
print*,''
endif
call MPI_barrier(MPI_COMM_WORLD,ierr)
enddo
call MPI_Type_free(newtype,ierr)
call MPI_Finalize(ierr)
end program scatter
subroutine f(psi)
implicit none
integer, dimension(96) :: psi
integer :: i
do i = 1,96
psi(i) = i
enddo
return
end
Here, I am sending the chunks gpsi(:,:,:,1:2,:)
to proc 1 (i.e. root), gpsi(:,:,:,3:4,:)
to proc 2, and gpsi(:,:,:,5:6,:)
to proc 3 (for three procs). Note that I have created a dummy subroutine called f(psi)
to allocate all the 96
elements of the multi-dim array gpsi
in a contiguous memory space so that I can correctly calculate the displacement array displs
, which in this case is displs[0,16,32]
. In this case, I successfully have the values 1,2,…,15,16
and 49,50,…,63,64
in proc 1, 17,18,…,31,32
and 65,66,…,79,80
in proc 2, and 33,34,…,47,48
and 81,82,…,95,96
in proc 3 after MPI_Scatterv
.
My actual problem is however, where I would like to scatter dissimilar chunks and all gather them afterwards. For example, if I want to send gpsi(:,:,:,1:2,:)
to proc 1, gpsi(:,:,:,3,:)
to proc 2, and say gpsi(:,:,:,4:6,:)
to proc 3. The question is how to construct the subarrays now as each subarray is of different size so, in principle, the new data type for each processor is now different. How to achieve that? Will highly appreciate any help.
Regards,
Madhurjya