2

Since MPI implements also iRecv and iSend, what are the advantages of using Send and Recv, which block the execution of the program and thus lead to a lower performance?

In question: mpi: blocking vs non-blocking

they write

"Blocking communication is used when it is sufficient, since it is somewhat easier to use. Non-blocking communication is used when necessary, for example, you may call MPI_Isend(), do some computations, then do MPI_Wait(). This allows computations and communication to overlap, which generally leads to improved performance."

but what does it mean "it is sufficient" and "it is necessary"?

elena
  • 889
  • 2
  • 11
  • 19
  • Possible duplicate of [mpi: blocking vs non-blocking](https://stackoverflow.com/questions/10017301/mpi-blocking-vs-non-blocking). This example covers your question, even though it is not an exact duplicate. – Chiel Jan 24 '18 at 10:45
  • I have read that question but it doesn't really cover my question since it highlights the differences between the two but it doesn't say when it is better to use Send and Recv @Chiel – elena Jan 24 '18 at 10:49
  • I think that the buffer reuse is the main argument given in the explanation. – Chiel Jan 24 '18 at 10:52

2 Answers2

5

First, you need to keep in mind a correct MPI application should not expect a blocking send (e.g. MPI_Send()) returns before a matching receive has been posted. For example, if two tasks need to exchange data, it is not correct to

MPI_Send(...);
MPI_Recv(...);

since it might deadlock. An option is to manually order to communications

if (peer < me) {
    MPI_Send(...);
    MPI_Recv(...);
} else {
    MPI_Recv(...);
    MPI_Send(...);
}

IMHO, that makes the application harder to write and maintain.

An other option is to use non blocking communications so you no more have to worry about deadlocks.

MPI_Isend(...);
MPI_Irecv(...);
MPI_Waitall(...);

Note in this a simple example that illustrates a more general issue, and MPI_Sendrecv() should be preferred here.

Some MPI libraries implement a progress thread with some interconnects. (Keep in mind most do not, but that will hopefully change). In this case, non blocking communications can be used to overlap computation and communication, and hence make the application more efficient.

MPI_Irecv(...);
// perform some computation that do no require the data to be received
MPI_Wait(...);

If your MPI library does not implement a progress thread, then no message will start being received before MPI_Wait() is invoked.

Not all applications can (simply) benefit from overlapping computations and communications. In this case

MPI_Recv(...);

is not only more compact, but might be more efficient than the non blocking counterpart since it leaves extra room for optimization by the MPI library compared to

MPI_Irecv(...);
MPI_Wait(...);

Bottom line, blocking is not better than non blocking nor the other way around. That being said, on a case by case basis, one is generally a better fit than the other.

Gilles Gouaillardet
  • 8,193
  • 11
  • 24
  • 30
  • 1
    Hi, as you mention some libraries implement a progress thread but some do not. Is there any way to check this? I am using Open MPI 3.1.3 – Ana Khorguani Feb 09 '20 at 11:22
  • Keep in mind this is also interconnect dependent. So your best bet is to write a small test program and try it on your platform. In the case of Open MPI, upgrading to 4.0.2 won’t hurt. – Gilles Gouaillardet Feb 09 '20 at 11:51
  • I see, thank you. I also found --disable-progress-threads and --enable-progress-threads - configurations while searching, but I guess they are not run time parameters right? – Ana Khorguani Feb 09 '20 at 12:03
  • These are not runtime parameters. Run `ompi_info --all` to list all the runtime parameters. – Gilles Gouaillardet Feb 09 '20 at 12:08
3

Sometimes you don't have anything else to do (in the current thread) until the Send or Recv operation is complete - for example, you need the result of Recv for your next operation.

In that case using blocking Recv is the best option and is better then using iRecv and then waiting.

If you have something to do - for example you are sending the previous result and at the same time calculating the next one then non-blocking operations are faster since you don't have to wait.

Basically - blocking only reduces performance if you have something except communication to do while the communication is in progress, otherwise blocking is the highest performance way to wait (due to sharing buffers, possible optimisations inside the waiting code, etc.)

Nir
  • 29,306
  • 10
  • 67
  • 103
  • Good answer, could you please elaborate on the highest performance for blocking calls...what do you mean with sharing buffers? System or receiver/Sender buffers? Which optimization in the waiting code? Also a reference would be appreciated. Thank you – Millemila Nov 30 '20 at 04:05