9

I have an MPI program which oversubscribes/overcommits its processors. That is: there are many more processes than processors.

Only a few of these processes are active at a given time, though, so there shouldn't be contention for computational resources.

But, much like the flock of seagulls from Finding Nemo, when those processes are waiting for communication they're all busy-looping, asking "Mine? Mine? Mine?"

                                      Nemo Seagulls

I am using both Intel MPI and OpenMPI (for different machines). How can I convince them both not to busy loop?

My quick and dirty solution has been to use MPI_Iprobe in a loop with a sleep command (see here).

Community
  • 1
  • 1
Richard
  • 56,349
  • 34
  • 180
  • 251
  • Are you running all processes on a single host? – Hristo Iliev May 06 '16 at 19:13
  • No, they may be distributed across a number of hosts. – Richard May 06 '16 at 19:15
  • What kind of network connects the hosts? – Hristo Iliev May 06 '16 at 19:32
  • I actually have that issue on a single host, just haven't had the chance to do my own research on it. Thus I would be really interested if there was a general answer, not just for a particular BTL. – Zulan May 06 '16 at 19:51
  • @Zulan, I don't believe there is a general switch for that. Each BTL has its own synchronisation and notification needs and preferences based on the hardware beneath. – Hristo Iliev May 06 '16 at 20:13
  • @HristoIliev: it would be ideal of the answer were not dependent on the particulars of the network. In one scenario I am using everything is on the same host. In another there are multiple hosts. – Richard May 06 '16 at 20:17
  • As something of a product plug, you could look at using [Adaptive MPI](http://charm.cs.illinois.edu/manuals/html/ampi/manual.html) to oversubscribe MPI ranks without oversubscribing processes, and hence not having a busy-wait condition. As a bonus, it can load-balance the active ranks among processes (and hence processors). – Phil Miller May 08 '16 at 15:32

1 Answers1

5

Its a been a while since this has been asked but this post may have the answer you're looking for. (tl;dr pass --mca mpi_yield_when_idle 1 as a parameter to mpirun if you're using OpenMPI)

Other than that, if your MPI processes are waiting at MPI Barriers, you can set the I_MPI_WAIT_MODE=1 to prevent the busy loop in INTEL's MPI. for OpenMPI, see linked post.

Richard
  • 56,349
  • 34
  • 180
  • 251
drewtu2
  • 76
  • 1
  • 7