Why the periods argument in MPI_Cart_create isn't working as expected?

Question

I am trying to parallelize a distributed hydrological model using MPI (in Fortran). The model takes its input data from a file and most of the inputs are 2D arrays. My goal is to decompose the 2D global data into a small chunk 2D data and send each chunk to different cores. Then do some computation on the small chunk and finally gather the results back to the root.

I just followed the wonderful answer by Jonathan Dursi at Sending 2D arrays in Fortran with MPI_Gather, and I wanted to add more flexibility to include the cases when the 2D array data domain in both dimensions is not equally divisible by the number of processor cores.

I have also seen the solution for the same problem based on MPI_Alltoallw at Scatter Matrix Blocks of Different Sizes using MPI again by Jonathan Dursi. That method needs the creation of 4 types of blocks. For me it is a bit difficult to implement that because I don't know the exact size of the global domain. The model is intended to be applied in different domain sizes, i.e. in different river basins to be specific. Also the model has many input data (some subroutines have up to 15 different types of 2D data) to communicate between cores.

So, I was trying to solve the problem this way.

Create optimal division of processors in each dimension of the Cartesian grid
Define a new Cartesian communicator (with periods=.true., this is very important)
Create new datatype with MPI_Type_create_subarray (with equally sized type. Actually, it is not equally sized, but I forced it with periods=.true. on both dimensions. And here is where the problem occurred)
Scatter the data with MPI_Scatterv
Do the computation
Gather the results with MPI_Gatherv (I know the gatherv truncates some data, but that is not a problem as that part of the data is not needed)

The problem is that the periods=.true. doesn't have an effect, or I misunderstand what periods means in MPI_Cart_create. My understanding was that if periods are true in a dimension, then after the end of the data in that dimension it will pick the data in the beginning of the dimension (i.e. the data at the beginning will be taken twice, first by let's say process 0, and then by the last process in that dimension). But this is not happening for me at the moment. The last process in that dimension picks the data at the beginning of the next row. After that everything goes crazy. Had the periods been functioning as I expected, I guess the approach was correct.

System and Environment (Ubuntu 21.10, GFortran 11.2.0, Open MPI 4.1.1)

I would be very happy if you have any idea why periods=.true. is not responding as expected.

I am playing with it by creating a 10 by 10 random number, and I paste everything here. Please pardon me if 10x10 is a big matrix for your eyes. For me it works fine when the number of processes is 4 and gives incorrect answer with 6 processes.

program topo
    USE mpi_f08

    implicit none

!VARIABLE DECLARATION ==> related to the application
integer, parameter :: rows = 10, cols = 10
integer, dimension(rows*cols) :: file
integer, dimension(rows, cols) :: array_a
integer, allocatable, dimension(:,:) :: array_b
integer :: i, j, p

!VARIABLE DECLARATION ==> related to MPI
TYPE(MPI_Comm) :: comcart, comm=MPI_COMM_WORLD
integer :: comrank, comsize, ierror

!VARIABLE DECLARATION ==> related to Topology
integer, parameter :: ndims = rank(array_a)
integer :: dims(ndims)
logical :: periodic(ndims), reorder

!VARIABLE DECLARATION ==> related to SUBARRAY
TYPE(MPI_Datatype) :: newtype, resizedtype
integer(kind=MPI_ADDRESS_KIND) :: extent, lb
integer, dimension(ndims) :: starts, subsizes, gsizes
integer :: intsize

!VARIABLE DECLARATION ==> related to MPI_Scatterv
integer, allocatable, dimension(:) :: displs, counts

!Initialize MPI environment
call MPI_Init(ierror)
call MPI_Comm_rank(comm, comrank, ierror)
call MPI_Comm_size(comm, comsize, ierror)

!Read input data from file (please consider as it is read from file)
if (comrank==0) then
    file=(/7,4,5,5,6,1,7,6,8,9,&
           8,7,4,1,8,0,0,5,0,8,&
           1,7,8,3,2,7,5,5,6,6,&
           7,1,8,8,2,4,5,1,3,0,&
           3,3,0,4,0,3,4,5,9,2,&
           1,9,7,5,8,2,0,7,9,7,&
           0,1,0,2,1,7,3,7,2,1,&
           9,3,0,1,3,0,5,2,0,2,&
           6,7,8,0,6,6,0,2,6,8,&
           0,2,3,4,2,8,3,6,2,3/)

!open(unit=20, file="random_num.txt")
!    read(20, 100) array_a
!close(20)

    array_a=reshape(file,[10, 10])
    write(*,*) "The original array from file"
    write(*,100) array_a
    write(*,*) "=============================================="
        100 format(10i10)
end if

!Specifying the number of cores in each dimension. Put (0, 0) to let MPI decide
dims = 0 

!Create optimal division of processors in each dimension of the Cartesian grid
Call MPI_Dims_create(comsize, ndims, dims, ierror)

!!Check the allocated cores in each dimension
!if (comrank==0) then
!    write(*,*) "The number of cores in Y and X directions are: ", dims
!end if

!Initialize some of the inputs for MPI_Cart_create
    periodic = .true.
    reorder = .false.

!Define a new communicator
Call MPI_Cart_create(comm, ndims, dims, periodic, reorder, comcart, ierror)

!Update ranks and size based on the new communicator (just in case)
call MPI_Comm_rank(comcart, comrank, ierror)
call MPI_Comm_size(comcart, comsize, ierror)

!Initialize some of the inputs for MPI_Type_create_subarray
gsizes=shape(array_a)

!Here is the trick to make equal-sized chunks (of course the periodic must be true) 
subsizes=ceiling(real(gsizes)/real(dims))   ! Or you can activate the alternatives below if you don't like this expression

!subsizes(1)=ceiling(real(gsizes(1))/real(dims(1)))
!subsizes(2)=ceiling(real(gsizes(2))/real(dims(2)))

starts = 0

!!Check the global and local sizes in each dimension
!if (comrank==0) then
!    write(*,*) "The number of global grids in Y and X directions are: ", gsizes
!    write(*,*) "The number of subgrids in Y and X directions are: ", subsizes
!end if

!Create new datatype with MPI_Type_create_subarray
call MPI_Type_create_subarray(ndims, gsizes, subsizes, starts, MPI_ORDER_FORTRAN, MPI_INTEGER, newtype, ierror)

!Set the extent of the newtype
call MPI_Type_size(MPI_INTEGER, intsize, ierror)
    extent = subsizes(1)*intsize
    lb = 0

!Resize the newtype based on the new lower bound(lb), and its upper bound is set to be lb + extent.
call MPI_Type_create_resized(newtype, lb, extent, resizedtype, ierror)

!Prepare the new datatype (resizedtype) for use
call MPI_Type_commit(resizedtype, ierror)

!Initialize some of the inputs for MPI_Scatterv
allocate(array_b(subsizes(1), subsizes(2)))
allocate (displs(comsize))
allocate (counts(comsize))

counts = 1  ! we will send one of these new types to everyone

do j = 1, dims(2)
    do i = 1, dims(1)
    displs(1+(i-1)+dims(1)*(j-1))=(i-1)+subsizes(2)*dims(1)*(j-1)
    end do
end do
!write(*,*) "Displs are: ", displs

!Everything is ready. Let's scatter the the data
call MPI_Scatterv(array_a, counts, displs, resizedtype, array_b, subsizes(1)*subsizes(2), MPI_INTEGER, 0, comcart, ierror)

!Check the received chunk of data at each core
do p=1, comsize
    if (comrank == p-1) then
        write (*, *) 'Rank ', comrank, ' received: '
        do i=1, subsizes(2)
            write(*, *) array_b(:, i)
        end do
    end if
    call MPI_Barrier(comcart, ierror)
end do

!Do some computation in each core
where (array_b==0)
    array_b=999
end where

!Check the chunk of data at each core after computation, i.e. the data soon to be send
do p=1, comsize
   if (comrank == p-1) then
       write(*, *) 'Rank ', comrank, ' sending: '
       do i=1, subsizes(2)
           write(*, *) array_b(:,i)
       end do
   end if
   call MPI_Barrier(comcart, ierror)
end do

!Gather the computation results back into the root
call MPI_Gatherv(array_b, subsizes(1)*subsizes(2), MPI_INTEGER, array_a, counts, displs, resizedtype, 0, comcart, ierror)

!Check the final result collected at the root
if (comrank == 0) then
    write(*, *) ' Root received: '
    do i=1,gsizes(2)
        write(*, *) array_a(:, i)
    end do
end if

!Deallocate resources associated with the committed type
call MPI_Type_free(resizedtype,ierror)

!Deallocate allocated arrays
deallocate(array_b)
deallocate(displs)
deallocate(counts)

!Terminate MPI environment
call MPI_Finalize(ierror)

end program topo

THE RESULT FROM 6 PROCESSORS LOOKS LIKE

The original array from file
     7         4         5         5         6         1         7         6         8         9
     8         7         4         1         8         0         0         5         0         8
     1         7         8         3         2         7         5         5         6         6
     7         1         8         8         2         4         5         1         3         0
     3         3         0         4         0         3         4         5         9         2
     1         9         7         5         8         2         0         7         9         7
     0         1         0         2         1         7         3         7         2         1
     9         3         0         1         3         0         5         2         0         2
     6         7         8         0         6         6         0         2         6         8
     0         2         3         4         2         8         3         6         2         3
==================================================================

 Rank            0  received:

       7           4           5           5
       8           7           4           1
       1           7           8           3
       7           1           8           8
       3           3           0           4

 Rank            1  received: 
       6           1           7           6
       8           0           0           5
       2           7           5           5
       2           4           5           1
       0           3           4           5

 Rank            2  received: 
       8           9           8           7
       0           8           1           7
       6           6           7           1
       3           0           3           3
       9           2           1           9

Rank            3  received: 
       0           1           0           2
       9           3           0           1
       6           7           8           0
       0           2           3           4
       0           0           3           4

 Rank            4  received: 
       1           7           3           7
       3           0           5           2
       6           6           0           2
       2           8           3           6
       0           0   640473088       22028

     Rank            5  received: 
       2           1           9           3
       0           2           6           7
       6           8           0           2
       2           3           0           0
640467440       22028  -487235792       32764

 Rank            0  sending: 
       7           4           5           5
       8           7           4           1
       1           7           8           3
       7           1           8           8
       3           3         999           4

 Rank            1  sending: 
       6           1           7           6
       8         999         999           5
       2           7           5           5
       2           4           5           1
     999           3           4           5

 Rank            2  sending: 
       8           9           8           7
     999           8           1           7
       6           6           7           1
       3         999           3           3
       9           2           1           9

Rank            3  sending: 
     999           1         999           2
       9           3         999           1
       6           7           8         999
     999           2           3           4
     999         999           3           4

 Rank            5  sending: 
       2           1           9           3
     999           2           6           7
       6           8         999           2
       2           3         999         999
640467440       22028  -487235792       32764

 Rank            4  sending: 
       1           7           3           7
       3         999           5           2
       6           6         999           2
       2           8           3           6
     999         999   640473088       22028

Root received: 
       7           4           5           5           6           1           7           6           8           9
       8           7           4           1           8         999         999           5         999           8
       1           7           8           3           2           7           5           5           6           6
       7           1           8           8           2           4           5           1           3         999
       3           3         999           4         999           3           4           5           9           2
       1           9           7           5           8           2           0           7           9           7
     999           1         999           2           1           7           3           7           2           1
       9           3         999           1           3         999           5           2         999           2
       6           7           8         999           6           6         999           2           6           8
     999           2           3           4           2           8           3           6           2           3

RUN FINISHED; exit value 0; real time: 200ms; user: 210ms; system: 210ms

I just reformatted the code and the results for better visualization. Thanks

I am pretty sure you can correctly quote your code so it is displayed correctly on this web interface. It looks like you misunderstand how the `periodic` argument is working, and I do not think it matters for a `MPI_Scatterv()` if the communicator is cartesian or `MPI_COMM_WORLD` here. you can add extra columns to your send buffer filled with `-1` and see if this is what is being received instead of the "random" values. — Gilles Gouaillardet, Nov 20 '21 at 03:44
Pro tip: when building your initial array, use the values corresponding to the rank that should receive it. you might have some surprises on ranks other than 5. — Gilles Gouaillardet, Nov 20 '21 at 06:31
Thank you so much @GillesGouaillardet and I reformatted the code a bit as per your recommendation. And you said **you can add extra columns to your send buffer filled with -1 and see if this is what is being received instead of the "random" values.**. That is the whole point of this code. How can I append extra rows and/or columns to an existing 2D array? If that is possible, then it will be easier to send equal-sized chunks in one go, do the computation on every chunk, and finally gather the result by truncating the results of the extra rows and/or columns added. — Endalk, Nov 20 '21 at 11:20
simply declare a larger array, copy `array_a` into it and fill the extra columns. — Gilles Gouaillardet, Nov 20 '21 at 12:10

Why the **periods** argument in MPI_Cart_create isn't working as expected?

0 Answers0

Why the periods argument in MPI_Cart_create isn't working as expected?