Breaking down a Matrix so that every process gets its share of the matrix using MPI

Question

I am fairly new to using MPI. My question is the following: I have a matrix with 2000 rows and 3 columns stored as a 2D array (not contiguous data). Without changing the structure of the array, depending on the number of processes np, each process should get a portion of the matrix. Example: A: 2D array of 2000 arrays by 3 columns, np = 2, then P0 gets the first half of A which would be 2D array of first 1000 rows by 3 columns, and P1 gets the second half which would be the second 1000 rows by 3 columns. Now np can be any number (as long as it divides the number of rows). Any easy way to go about this? I will have to use FORTRAN 90 for this assignment. Thank you

score 7 · Accepted Answer · edited May 23 '17 at 12:34

Row-wise distribution of 2D arrays in Fortran is tricky (but not impossible) using scatter/gather operations directly because of the column-major storage. Two possible solutions follow.

Pure Fortran 90 solution: With Fortran 90 you can specify array sections like A(1:4,2:3) which would take a small 4x2 block out of the matrix A. You can pass array slices to MPI routines. Note with current MPI implementations (conforming to the now old MPI-2.2 standard), the compiler would create temporary contiguous copy of the section data and would pass it to the MPI routine (since the lifetime of the temporary storage is not well defined, one should not pass array sectons to non-blocking MPI operations like MPI_ISEND). MPI-3.0 introduces new and very modern Fortran 2008 interface that allows MPI routines to directly take array sections (without intermediate arrays) and supports passing of sections to non-blocking calls.

With array sections you only have to implement a simple DO loop in the root process:

INTEGER :: i, rows_per_proc

rows_per_proc = 2000/nproc
IF (rank == root) THEN
  DO i = 0, nproc-1
    IF (i /= root) THEN
      start_row = 1 + i*rows_per_proc
      end_row = (i+1)*rows_per_proc
      CALL MPI_SEND(mat(start_row:end_row,:), 3*rows_per_proc, MPI_REAL, &
                    i, 0, MPI_COMM_WORLD, ierr)
    END IF
  END DO
ELSE
  CALL MPI_RECV(submat(1,1), 3*rows_per_proc, MPI_REAL, ...)
END IF

Pure MPI solution (also works with FORTRAN 77): First, you have to declare a vector datatype with MPI_TYPE_VECTOR. The number of blocks would be 3, the block length would be the number of rows that each process should get (e.g. 1000), the stride should be equal to the total height of the matrix (e.g. 2000). If this datatype is called blktype, then the following would send the top half of the matrix:

REAL, DIMENSION(2000,3) :: mat

CALL MPI_SEND(mat(1,1), 1, blktype, p0, ...)
CALL MPI_SEND(mat(1001,1), 1, blktype, p1, ...)

Calling MPI_SEND with blktype would take 1000 elements from the specified starting address, then skip the next 2000 - 1000 = 1000 elements, take another 1000 and so on, 3 times in total. This would form a 1000-row sub-matrix of your big matrix.

You can now run a loop to send a different sub-block to each process in the communicator, effectively performing a scatter operation. In order to receive this sub-block, the receiving process could simply specify:

REAL, DIMENSION(1000,3) :: submat

CALL MPI_RECV(submat(1,1), 3*1000, MPI_REAL, root, ...)

If you are new to MPI, this is all you need to know about scattering matrices by rows in Fortran. If you know well how the type system of MPI works, then read ahead for more elegant solution.

(See here for an excellent description on how to do that with MPI_SCATTERV by Jonathan Dursi. His solution deals with splitting a C matrix in columns, which essentially poses the same problem as the one here as C stores matrices in row-major fashion. Fortran version follows.)

You could also make use of MPI_SCATTERV but it is quite involved. It builds on the pure MPI solution presented above. First you have to resize the blktype datatype into a new type, that has an extent, equal to that of MPI_REAL so that offsets in array elements could be specified. This is needed because offsets in MPI_SCATTERV are specified in multiples of the extent of the datatype specified and the extent of blktype is the size of the matrix itself. But because of the strided storage, both sub-blocks would start at only 4000 bytes apart (1000 times the typical extent of MPI_REAL). To modify the extent of the type, one would use MPI_TYPE_CREATE_RESIZED:

INTEGER(KIND=MPI_ADDRESS_KIND) :: lb, extent

! Get the extent of MPI_REAL
CALL MPI_TYPE_GET_EXTENT(MPI_REAL, lb, extent, ierr)
! Bestow the same extent upon the brother of blktype
CALL MPI_TYPE_CREATE_RESIZED(blktype, lb, extent, blk1b, ierr)

This creates a new datatype, blk1b, which has all characteristics of blktype, e.g. can be used to send whole sub-blocks, but when used in array operations, MPI would only advance the data pointer with the size of a single MPI_REAL instead of with the size of the whole matrix. With this new type, you could now position the start of each chunk for MPI_SCATTERV on any element of mat, including the start of any matrix row. Example with two sub-blocks:

INTEGER, DIMENSION(2) :: sendcounts, displs

! First sub-block
sendcounts(1) = 1
displs(1) = 0
! Second sub-block
sendcounts(2) = 1
displs(2) = 1000

CALL MPI_SCATTERV(mat(1,1), sendcounts, displs, blk1b, &
                  submat(1,1), 3*1000, MPI_REAL, &
                  root, MPI_COMM_WORLD, ierr)

Here the displacement of the first sub-block is 0, which coincides with the beginning of the matrix. The displacement of the second sub-block is 1000, i.e. it would start on the 1000-th row of the first column. On the receiver's side the data count argument is 3*1000 elements, which matches the size of the sub-block type.

Are you referring to this one? http://stackoverflow.com/questions/10788180/sending-columns-of-a-matrix-using-mpi-scatter — dreamcrash, Nov 29 '12 at 20:36

Breaking down a Matrix so that every process gets its share of the matrix using MPI

1 Answers1