I've been fighting for a couple of days with an MPI_REDUCE call in a gfortran atmospheric transport model code, having good input parameters, but returning highly unreasonable results in the master's recvbuf. I've been able to replicate the problem in a simple example, as follows:
PROGRAM TEST
USE mpi
IMPLICIT NONE
INTEGER my_rank, size, ierror
INTEGER, PARAMETER :: nx=1000, ny=1000, nz=6
INTEGER :: buffsize
REAL, DIMENSION(nx, ny, nz) :: u, v
call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, my_rank, ierror)
PRINT *, 'my_rank, size: ', my_rank, size
buffsize = nx*ny*nz
u = my_rank + 1
PRINT *, 'PE: ', my_rank, ', Before reduce, SHAPE(u): ', SHAPE(u)
PRINT *, 'PE: ', my_rank, ', Before reduce, SUM(u): ', SUM(u)
CALL MPI_REDUCE(u, v, buffsize, MPI_REAL, &
& MPI_SUM, 0, MPI_COMM_WORLD, ierror)
CALL MPI_BARRIER(MPI_COMM_WORLD, ierror)
PRINT *, 'PE: ', my_rank, ', after reduce, ierror: ', ierror
PRINT *, 'PE: ', my_rank, ', after reduce, SUM(u): ', SUM(u)
PRINT *, 'PE: ', my_rank, ', after reduce, SUM(v): ', SUM(v)
CALL MPI_FINALIZE(ierror)
END PROGRAM test
It returns:
mpirun -np 2 ./test3
my_rank, size: 0 2
my_rank, size: 1 2
PE: 1 , Before reduce, SHAPE(u): 1000 1000 6
PE: 0 , Before reduce, SHAPE(u): 1000 1000 6
PE: 0 , Before reduce, SUM(u): 6000000.00
PE: 1 , Before reduce, SUM(u): 12000000.0
PE: 0 , after reduce, ierror: 0
PE: 1 , after reduce, ierror: 0
PE: 1 , after reduce, SUM(u): 12000000.0
PE: 0 , after reduce, SUM(u): 6000000.00
PE: 1 , after reduce, SUM(v): 0.00000000
PE: 0 , after reduce, SUM(v): 18407592.0
PE0 "should" be showing 18000000.0 as the SUM(v) in the last line.
If I set the nz parameter in the code from 6 to 5, the run produces correct results. What's really confusing is that it behaves this way, returning the same sum of reduced values on a) an AWS EC2 instance with gfortran 5.3 and openmpi, b) my laptop's gfortran 5.4 with mpich, and c) a workstation's gfortran 4.4 with openmpi.
If I change the type of array to DOUBLE PRECISION (as wells as specify that in the MPI_REDUCE call) it works fine, even for much larger arrays. If I use REAL4 rather than REAL, it produces the same bad results.
I know this has to be simple and that I'm being a real idiot here, but I'm just not understanding this. I've read some suggestions that my buffer size needs to be an integer value less than 2^31-1, but that's certainly the case here.