1

I am running a parallel Fortran90 code using MPI. In every rank, there is an allocatable 3D array "u" that stores double precision floating point numbers. It has the same size for all ranks. Some parts of the code are skipped by a large number of MPI processes, since up to now they were not involved in some computations. The code in the main looks somewhat like this:

[...] !previous code

if(number_of_samples(rank).gt.0) then
    do i=1,number_of_samples(rank)
        call perform_some_action() !only called by ranks with more than zero samples
    enddo
endif

[...] !subsequent code

The subroutine "perform_some_action" does some computations, which up to now only needed information that was stored within the respective MPI process. With my latest modification however, I sometimes need access to one single value of the array "u" which is stored in a neighbor rank, and not in the rank that performs the calculations within the subroutine "perform_some_action".

My question is: is there some simple way to, within "perform_some_action", retrieve a single value of array "u" in another rank? I know exactly in which rank the required value is stored, and I know the position of the value within "u" in this rank.

My first idea was to use MPI_SEND and MPI_RECV, however I think this is not possible if the code never passes the sender rank? I.e., since I only call "perform_some_action" from ranks with number_of_samples(rank).gt.0, I will not go through the subroutine in ranks with number_of_samples(rank)=0. Still, I need to sometimes access the array "u" of these ranks.

My latest idea was to use MPI_SENDRECV. My subroutine looks like this:

subroutine perform_some_action()

    implicit none
    include 'mpif.h'
    real(8) :: saveValueHere
    [...]
    call MPI_SENDRECV(u(i,j,k),1,MPI_DOUBLE_PRECISION,rank,0,&
                      saveValueHere,1,MPI_DOUBLE_PRECISION,sendingRank,0,&
                      MPI_COMM_Cart,status,ierr)
    [...]
end subroutine

However, using this approach, I always get "MPI_ERR_TRUNCATE: message truncated".

Another idea I had would be to, within the subroutine, create a new MPI communicator containing only the receiving rank, and call MPI_BCAST to send the single value. However, MPI_BCAST is obviously not desinged to do something like point-to-point communication.

Edit: I tried to provide an example for reproduction:

PROGRAM mpitest
USE mpi
IMPLICIT NONE

INTEGER :: ierr, npe, rank, win
INTEGER :: is,ie,js,je,ks,ke
REAL(8), dimension(:,:,:), allocatable :: u
REAL(8) :: saveValueHere

CALL MPI_INIT( ierr )
CALL MPI_COMM_RANK( MPI_COMM_WORLD, rank, ierr )
CALL MPI_COMM_SIZE( MPI_COMM_WORLD, npe, ierr )

if(rank.eq.0) then
    is=1
    ie=2
    js=1
    je=2
    ks=1
    ke=2
else if(rank.eq.1) then
    is=3
    ie=4
    js=3
    je=4
    ks=3
    ke=4
endif

allocate(u(is:ie,js:je,ks:ke))
u=0.d0
if(rank.eq.0) then
    u(is,js,ks)=1.d0
    u(is,js,ke)=2.d0
    u(is,je,ks)=3.d0
    u(is,je,ke)=4.d0
    u(ie,js,ks)=5.d0
    u(ie,js,ke)=6.d0
    u(ie,je,ks)=7.d0
    u(ie,je,ke)=8.d0
else if(rank.eq.1) then
    u(is,js,ks)=11.d0
    u(is,js,ke)=12.d0
    u(is,je,ks)=13.d0
    u(is,je,ke)=14.d0
    u(ie,js,ks)=15.d0
    u(ie,js,ke)=16.d0
    u(ie,je,ks)=17.d0
    u(ie,je,ke)=18.d0
endif

if(rank.eq.0) then
    write(*,*) 'get u(3,4,3)=13.d0 from rank 1 and save in saveValueHere'
endif

CALL MPI_FINALIZE(ierr)

END PROGRAM mpitest
tre95
  • 433
  • 5
  • 16
  • 3
    If only one process is "active" at the time the message is required to be sent you won't be able to use any of the two sided routines you discuss above, both the process that requires the data and the process upon which the data is stored must be involved. You might be able to use MPI one sided routines. However quite what way is best is very difficult to say with a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) – Ian Bush Sep 08 '21 at 16:59
  • 3
    Oh, BTW, don't `Include mpif.h` but rather `Use MPI`, it has the chance of catching many more errors at compile time that might otherwise slip through. Similarly `Real( 8 )` is bad practice, might not be supported by a compiler and might not do what you think it should do - see https://stackoverflow.com/questions/838310/fortran-90-kind-parameter – Ian Bush Sep 08 '21 at 17:01
  • MPI one-sided might be a solution, but that has lots of synchronization problem. Easiest is to let everyone `perform_some_action`, but if you have zero samples, only do the send and nothing else. Ask yourself: while the processes with samples do that action, are the other ones doing something completely different simultaneously? If not, then they are just sitting around waiting for the next synchronization point (some all reduce or whatever), and you might as well let them participate. – Victor Eijkhout Sep 08 '21 at 19:52
  • Thanks for your replies! MPI_GET seems to be a candidate, however I could not find a simple Fortran example how to use it up to now. I assume the call would look something like that: ```call MPI_GET(u_i1_jp_kp,1,MPI_DOUBLE_PRECISION,sender_i1_jp_kp,0,1,MPI_DOUBLE_PRECISION,win)``` - u_i1_jp_kp is the double that should store the value of u(i1,jp,kp) in the rank sender_i1_jp_kp. However I do not understand how to tell MPI_GET that it should fetch the value u(i1,jp,kp) from the sender rank. – tre95 Sep 09 '21 at 08:46
  • Unfortunately I am not supposed to replace ```Include mpif.h``` and ```real(8)```, since it has been used thousands of times by the original developers (I am only working on a small part of it), but thanks for the advice. As for the advice to let all ranks perform the subroutine ```perform_some_action```: Some ranks only have to send info, some only have to receive, some have to do both, and some none of all. All have their own list of samples. Also the number of values that must be sent/received is variable. So it seems too complicated with my MPI experience. – tre95 Sep 09 '21 at 08:50
  • Okay I have also added a reproducible example, sorry for the potentially bad MPI programming but I never really had to touch this area... – tre95 Sep 09 '21 at 11:17
  • What you are trying to do is more natural in coarrays and other PGAS paradigms. The one-sided MPI is more complicated. You need to set-up a memory window first. – Vladimir F Героям слава Sep 09 '21 at 13:13

0 Answers0