2

I'm building a distributed web server in C/MPI and it seems like point-to-point communication completely stops working after the first MPI_BARRIER in my code. Standard C code works after the barrier, so I know that each of the threads makes it through the barrier. Point-to-point communication also works just fine before the barrier. However, when I copy-paste the same code that worked the line before the barrier into the line after the barrier it stops working entirely. The SEND will just wait forever. When I try using an ISEND instead, it makes it through the line, but the message is never received. I've been googling this problem a lot and everyone who has problems with MPI_BARRIER is told the barrier works correctly and their code is wrong, but I cannot for the life of me figure out why my code is wrong. What could be causing this behavior?

Here is a sample program that demonstrates this:

#include <mpi.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
  int procID;
  int val;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &procID);
  MPI_Barrier(MPI_COMM_WORLD);

  if (procID == 0)
  {
    val = 4;
    printf("Before send\n");
    MPI_Send(&val, 1, MPI_INT, 1, 4, MPI_COMM_WORLD);
    printf("after send\n");
  }

  if (procID == 1)
  {
    val = 1;
    printf("before: val = %d\n", val);
    MPI_Recv(&val, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
    printf("after: val = %d\n", val);
  }

  MPI_Finalize();
  return 0;
}

Moving the two if statements before the barrier causes this program to run correctly.

EDIT - It appears that the first communication, regardless of type, works, and all future communications fail. This is much more general that I thought at first. It doesn't matter if the first communication is a barrier or some other message, no future communications work properly.

TEOUltimus
  • 184
  • 8
  • The code you posted looks fine to me. What version of MPI are you using? – suszterpatt May 05 '12 at 21:43
  • With openmpi 1.5.5 works fine for me. – chemeng May 05 '12 at 22:46
  • I know it's openmpi, but I cannot seem to figure out what version number. Is there a command that tells you? – TEOUltimus May 05 '12 at 23:53
  • `mpirun --version` will give you the version of openmpi – Dima Chubarov May 06 '12 at 02:50
  • How many ranks are you using to test the code? – Stan Graves May 06 '12 at 03:23
  • Version is 1.5.3, Have been testing it with 2 ranks. It works when the two are on the same machine and breaks when they are on different ones. – TEOUltimus May 06 '12 at 03:43
  • Could you please provide more information about your setup? In particular: what kind of interconnect do you have (TCP/IP over Ethernet, Myrinet, InfiniBand...), how many network interfaces does each node have, how are those interfaces configured? Can you also try to run your program with the following additional argument to `mpirun` and post the output: `mpirun --mca btl_base_verbose 50 ...` – Hristo Iliev May 06 '12 at 10:41
  • 2
    @TEOUltimus: "It works when the two are on the same machine and breaks when they are on different ones." This sounds like a network configuration error where messages can't reach the target machines. – suszterpatt May 06 '12 at 11:37
  • @suszterpatt: I agree that it sounds like a network issue. I'm unsure what to do about it :-/ – TEOUltimus May 06 '12 at 17:45
  • @HristoIliev: I believe that the interconnect is TCP/IP over Ethernet, but I am not certain of that. Here is a link with the output of the command you asked me to run (http://pastebin.com/CN8PSRUk). – TEOUltimus May 06 '12 at 17:46

1 Answers1

5

Open MPI has a know feature when it uses TCP/IP for communications: it tries to use all configured network interfaces that are in "UP" state. This presents as a problem if some of the other nodes are not reachable through all those interfaces. This is part of the greedy communication optimisation that Open MPI employs and sometimes, like in your case, leads to problems.

It seems that at least the second node has more than one interfaces that are up and that this fact was introduced to the first node during the negotiation phase:

  • one configured with 128.2.100.167
  • one configured with 192.168.109.1 (do you have a tunnel or Xen running on the machine?)

The barrier communication happens over the first network and then the next MPI_Send tries to send to the second address over the second network which obviously does not connect all nodes.

The easiest solution is to tell Open MPI only to use the nework that connects your nodes. You can tell it do so using the following MCA parameter:

--mca btl_tcp_if_include 128.2.100.0/24

(or whatever your communication network is)

You can also specify the list of network interfaces if it is the same on all machines, e.g.

--mca btl_tcp_if_include eth0

or you can tell Open MPI to specifically exclude certain interfaces (but you must always tell it to exclude the loopback "lo" if you do so):

--mca btl_tcp_if_exclude lo,virt0

Hope that helps you and many others that appears to have the same problems around here at SO. It looks like that recently almost all Linux distros has started bringing up various network interfaces by default and that is likely to cause problems with Open MPI.

P.S. Put those nodes behind a firewall, please!

Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186
  • Thank you! The second fix you recommended (specifying the network interface) worked beautifully. Also, the cluster I am running this on is behind my school's firewall. I'm pretty sure they know what they're doing. :D – TEOUltimus May 06 '12 at 21:05