Is MPI_Gather the best choice?

Question

There are 4 processes and one of them (0) is the master which has to build the matrix C as follow

-1  0  0 -1  0
 0 -1  0  0 -1
-1  1  1 -1  1
 1 -1  1  1 -1
-1  2  2 -1  2
 2 -1  2  2 -1
-1  3  3 -1  3
 3 -1  3  3 -1

To do so, the matrix is declared as REAL, DIMENSION(:,:), ALLOCATABLE :: C and allocated with

IF (myid == 0) THEN
        ALLOCATE(C(2*nprocs,-2:+2))
END IF

where nprocs is the number of processes. Process 0 also sets C = -1. For the communications I first tried with

CALL MPI_GATHER((/0.0+myid,0.0+myid/),&
              & 2,MPI_REAL,&
              & C(:,0),&
              & 2,MPI_REAL,&
              & 0,MPI_COMM_WORLD,ieri)

to fill up the central column, and this worked. Then I tried with

CALL MPI_GATHER((/myid, myid, myid, myid/),&
              & 4,MPI_REAL,&
              & (/C(1:2*nprocs:2,-1),C(2:2*nprocs:2,-2),C(1:2*nprocs:2,+2),C(2:2*nprocs:2,+1)/),&
              & 4,MPI_REAL,&
              & 0,MPI_COMM_WORLD,ierr)

to fill the other columns, but it didn't work, giving errors like the following

Fortran runtime error: Index '1' of dimension 1 of array 'c' outside of expected range (140735073734712:140735073734712).

To understand why, I tried to fill the first column alone with the call

CALL MPI_GATHER((/0.0-myid/),&
              & 1,MPI_REAL,&
              & C(1:2*nprocs:2,-2),&
              & 1,MPI_REAL,&
              & 0,MPI_COMM_WORLD,ierr)

but the same happened, more or less.

I solved the problem by allocating C for all the processes (i.e. regardless of the process id). Why does this make the call work?

After this I did a little change (before trying again to fill all the columns at once) simply putting the receive buffer in (/.../)

CALL MPI_GATHER((/0.0-myid/),&
              & 1,MPI_REAL,&
              & (/C(1:2*nprocs:2,-2)/),&
              & 1,MPI_REAL,&
              & 0,MPI_COMM_WORLD,ieri)

but this makes the call ineffective (no errors, but not even one element in C changed).

Hope someone can explain to me

what's wrong with the constructor (/.../) in the receive buffer?
why the receive buffer has to be allocated in the non-root processes?
it is necessary to use mpi_gatherv to accomplish the task?
is there a better way to build up such a matrix?

EDIT Is it possible to use MPI derived data types to build the matrix?

Yes it is possible to use derived types, but that is a topic for a new question. — Vladimir F Героям слава, Mar 24 '16 at 10:36

Vladimir F Героям слава · Accepted Answer · 2016-03-11T18:29:26.310

1

First do use use mpi instead of include mpif.h if you are not doing that already. Some of these errors might be found by this.

You cannot use an array constructor as a receive buffer. Why? The array created by a constructor is an expression. You cannot use it where a variable is required.

The same way you cannot pass 1+1 to a subroutine which changes is argument. 1+1 is an expression and you need a variable if it is to be changed.

Secondly, every array into which you write or from which you read must be allocated. In MPI_Gather the receive buffer is ignored for all nonroot processes. BUT when you make a subarray from an array like C(1:2*nprocs:2,-2) from C, such an array must be allocated. This is a Fortran thing, not an MPI one.

If the number of elements received from each rank is the same you can use MPI_Gather, you don' need MPI_Gatherv.

You may consider just receiving the data into a 1D buffer and reorder them as necessary. Another option is to decompose it along the last dimension instead.

edited Mar 11 '16 at 18:29

answered Mar 11 '16 at 18:24

Vladimir F Героям слава

57,977
4
76
119

Ok about the first and third point (can't use expression as an argument and no need to use mpi_gatherv). About the second, the matrix `C` allocated for the process 0, so where is the problem? Finally, what do you mean by "decompose it along the last dimension"? – Enlico Mar 11 '16 at 20:39
Well, it is allocated on first process, but you are making a subarray of it in all processes! – Vladimir F Героям слава Mar 11 '16 at 20:41
Decompose along the last dimension means something like 'C(-2:2,2*nprocs)' or similar. – Vladimir F Героям слава Mar 11 '16 at 20:42
Why all processes if the receive buffer is ignored for all non-root processes? – Enlico Mar 11 '16 at 20:44
1

Because it is ignored only after you pass something legal. But it crashes before you manage to get there. Making subarray of unallocated array is simply impossible. Don't do it. Use some if condition or anything to avoid it. – Vladimir F Героям слава Mar 11 '16 at 20:47
Can you give me a reference where this is extensively explained, please? I really don't understand... – Enlico Mar 11 '16 at 22:00
You cant do C(:) or similar when C is not allocated. It is as simple as that. I doubt exactly this is specially treated anywhere. You can't do almost anything with a non-allocated array. – Vladimir F Героям слава Mar 11 '16 at 22:11
The point that is not clear to me is the following: there are 10 (for instance) versions of `C`, and the only one which is allocated is the only one to be used. So why the other 9 non-allocated `C` cause the problem you said if they are not used at all? – Enlico Mar 11 '16 at 22:23
1

You DO use all of them. Doing C(:) is making a subarray of an array. It IS using the array and it is illegal for a non allocated array. – Vladimir F Героям слава Mar 11 '16 at 22:46

Is MPI_Gather the best choice?

1 Answers1

Linked