MPI send-receive issue in Fortran

Question

I am currently starting to develop a parallel code for scientific applications. I have to exchange some buffers from p0 to p1 and from p1 to p0 (I am creating ghost point between processors boundaries).

The error can be summarized by this sample code:

program test
use mpi
implicit none

integer id, ids, idr, ierr, tag, istat(MPI_STATUS_SIZE)
real sbuf, rbuf

call mpi_init(ierr)

call MPI_COMM_RANK(MPI_COMM_WORLD,id,ierr)

if(id.eq.0) then
ids=0
idr=1
sbuf=1.5
tag=id
else    
ids=1
idr=0
sbuf=3.5
tag=id
endif

call mpi_send(sbuf,1,MPI_REAL,ids,tag,MPI_COMM_WORLD,ierr)

call mpi_recv(rbuf,1,MPI_REAL,idr,tag,MPI_COMM_WORLD,istat,ierr)

call mpi_finalize(ierr)
return
end

What is wrong with this?

Welcome to SO. Your question is not quite clear. Please read carefully [ask] and add a [mcve] to your question. — Zulan, Nov 12 '17 at 11:39
Hello Zulan, sorry my question is not clear, I will try to simplify a bit. I have 2 processes (rank=0 and rank=1). I need to exchange a vector from 0 to 1, and, at the same time, a vector from 1 to 0. How can I perform this communication? — alie, Nov 12 '17 at 12:56
Welcome. Your code is too incomplete. Wee need to see something which we can compile and test. Including all variable declarations and so on. — Vladimir F Героям слава, Nov 12 '17 at 17:01
I recommend you read here: https://stackoverflow.com/questions/10017301/mpi-blocking-vs-non-blocking. You must understand the difference between blocking and non-blocking operations. — Ross, Nov 12 '17 at 20:18

score 2 · Accepted Answer · answered Nov 12 '17 at 20:39

Coding with MPI can be difficult at first, and it's good that you're going through the steps of making a sample code. Your sample code as posted hangs due to deadlock. Both processes are busy MPI_SEND-ing, and the send cannot complete until it has been MPI_RECV-ed. So the code is stuck.

There are two common ways around this problem.

Send and Receive in a Particular Order

This is the simple and easy-to-understand solution. Code your send and receive operations such that nobody ever gets stuck. For your 2-process test case, you could do:

if (id==0) then

   call mpi_send(sbuf,1,MPI_REAL,ids,tag,MPI_COMM_WORLD,ierr)
   call mpi_recv(rbuf,1,MPI_REAL,idr,tag,MPI_COMM_WORLD,istat,ierr)
else
   call mpi_recv(rbuf,1,MPI_REAL,idr,tag,MPI_COMM_WORLD,istat,ierr)
   call mpi_send(sbuf,1,MPI_REAL,ids,tag,MPI_COMM_WORLD,ierr)
endif

Now, process 1 receives first, so there is never a deadlock. This particular example is not extensible, but there are various looping structures that can help. You can imagine a routine to send data from every process to every other process as:

do sending_process=1,nproc
   if (id == sending_process) then
      ! -- I am sending
      do destination_process = 1,nproc
         if (sending_process == destination_process) cycle
         call MPI_SEND ! Send to destination_process
      enddo
    elseif
       ! -- I am receiving
       call MPI_RECV ! Receive from sending_process
    endif
 enddo

This works reasonably well and is easy to follow. I recommend this structure for beginners.

However, it has several issues for truly large problems. You are sending a number of messages equal to the number of processes squared, which can overload a large network. Also, depending on your operation, you probably do not need to send data from every process to every other process. (I suspect this is true for you given you mentioned ghosts.) You can modify the above loop to only send if data are required, but for those cases there is a better option.

Use Non-Blocking MPI Operations

For many-core problems, this is often the best solution. I recommend sticking to the simple MPI_ISEND and MPI_IRECV. Here, you start all necessary sends and receives, and then wait. Here, I am using some list structure which has been setup already which defines the complete list of necessary destinations for each process.

! -- Open sends
do d=1,Number_Destinations
   idest = Destination_List(d)

   call MPI_ISEND ! To destination d
enddo

! -- Open receives
do s=1,Number_Senders
   isend = Senders_List(s)
   call MPI_IRECV ! From source s
enddo

call MPI_WAITALL

This option may look simpler but it is not. You must set up all necessary lists beforehand, and there are a variety of potential problems with buffer size and data alignment. Even still, it is typically the best answer for big codes.

Dear Ross, Thank you very much! This is a really good answer. I'll try tomorrow to follow your guide! Well, I am setting up a code for fluid dynamics (in serial works great). So in principle, each processor should know its neighbour cells and every time step, this ghost cells must be updated with the new value of velocity ecc... coming from the neighbour partition. I think that some of you structures should be good for this work. — alie, Nov 12 '17 at 20:58
I also work in CFD. What problem size (cells and number of processors) are you targeting? — Ross, Nov 12 '17 at 20:58
It is a code for unstructured mesh (the data structure of the mesh is the same as openFoam). My target are 3D unsteady flows in general, so let's say up to 64 proc and order of million cells :) it is a personal project and I am planning to use it for future project. But first I need a parallelization :) — alie, Nov 12 '17 at 21:11
The first option will be best for you, I suspect. It is easier to debug and monitor performance, and its limitations shouldn't be a big deal at your problem size. — Ross, Nov 12 '17 at 22:01

score 0 · Answer 2 · answered Nov 12 '17 at 17:29

0

As pointed by Vladimir, your code is too incomplete to provide a definitive answer.

That being said, that could be a well known error.

MPI_Send() might block. From a pragmatic point of view, MPI_Send() is likely to return immediately when sending a short message, but is likely to block when sending a large message. Note small and large depends on your MPI library, the interconnect you are using plus other runtime parameters. MPI_Send() might block until a MPI_Recv() is posted on the other end.

It seems you MPI_Send() and MPI_Recv() in the same block of code, so you can try using MPI_Sendrecv() to do it in one shot. MPI_Sendrecv() will issue a non blocking send under the hood, so that will help if your issue is really a MPI_Send() deadlock.

answered Nov 12 '17 at 17:29

Gilles Gouaillardet

8,193
11
24
30

I have put a sample code to recreate the problem. Using mpi_sendrecv I get again a block. Thanks – alie Nov 12 '17 at 18:16
Your tags do not match, so your test can only deadlock. – Gilles Gouaillardet Nov 12 '17 at 19:06
Ok, but since I am a noob with mpi, could you please provide me an example (if you have time)? Meanwhile I'll try to work on the what you have suggest! Thanks – alie Nov 12 '17 at 19:21
You can use a unique tag for all your communications – Gilles Gouaillardet Nov 12 '17 at 19:26
I have tried to put tag=0 in send and recv function. But my test deadlock again.. – alie Nov 12 '17 at 19:34
I think this is bad advice, especially for somebody new to MPI. Stick to simple send/receive to start, and make them nonblocking (`MPI_ISEND`). `MPI_SENDRECV` is more convoluted and it can be difficult to extend, in my opinion. – Ross Nov 12 '17 at 20:07
Also, you should *never* code as though `MPI_SEND` will be non-blocking, regardless of the message size. – Ross Nov 12 '17 at 20:09
1

WRT to Sendrecv I disagree. In my experience MPI_Sendrecv often simplifies code compared to separate sends and recvs, and avoids the confusion that async messages can bring, especially for beginners. But this is getting opinion based so I wont say more. Of course you must always code as though MPI_Send and MPI_Recv are blocking. – Ian Bush Nov 12 '17 at 20:59
`ids` is wrong (you send to yourself), it should be identical to `idr` – Gilles Gouaillardet Nov 12 '17 at 21:05
@Ross how is non blocking plus wait(all) less convoluted than a single call to `MPI_Sendrecv()` ? – Gilles Gouaillardet Nov 12 '17 at 21:11
For learning, I think it's most useful to start from the basics. Which is why I recommended re-ordering the sends and receives. For performance, you have to instruct a particular order to pair off send/receives, which is challenging for large distributed cases. But I agree with @IanBush that this is mostly opinion-based. – Ross Nov 12 '17 at 21:45
My major issue with your answer is that you don't make it clear that the problem is deadlock, and instead propose a different solution entirely. – Ross Nov 12 '17 at 21:47
When the question was initially posted, it was unclear whether the issue was a deadlock or not. With the latest code, the issue is with tags and peers (and the message is very small and hence unlikely to deadlock) – Gilles Gouaillardet Nov 12 '17 at 22:53
A fair point. I hadn't noticed the edit which substantially clarified the question. I still don't recommend treating any MPI_SEND as non-blocking though. Edit: apparently I can't remove my downvote... – Ross Nov 13 '17 at 17:15
no one ever said `MPI_Send()` should be assumed to return immediately :-) – Gilles Gouaillardet Nov 13 '17 at 22:36

MPI send-receive issue in Fortran

2 Answers2

Send and Receive in a Particular Order

Use Non-Blocking MPI Operations