2

It was my understanding that MPI communicators restrict the scope of communication, such that messages sent from one communicator should never be received in a different one.

However, the program inlined below appears to contradict this.

I understand that the MPI_Send call returns before a matching receive is posted because of the internal buffering it does under the hood (as opposed to MPI_Ssend). I also understand that MPI_Comm_free doesn't destroy the communicator right away, but merely marks it for deallocation and waits for any pending operations to finish. I suppose that my unmatched send operation will be forever pending, but then I wonder how come the same object (integer value) is reused for the second communicator!?

Is this normal behaviour, a bug in the MPI library implementation, or is it that my program is just incorrect?

Any suggestions are much appreciated!

LATER EDIT: posted follow-up question


#include "stdio.h"
#include "unistd.h"
#include "mpi.h"

int main(int argc, char* argv[]) {
    int  rank, size;
    MPI_Group group;
    MPI_Comm my_comm;

    MPI_Init(&argc, &argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_group(MPI_COMM_WORLD, &group);

    MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
    if (rank == 0) printf("created communicator %d\n", my_comm);

    if (rank == 1) {
        int msg = 123;
        MPI_Send(&msg, 1, MPI_INT, 0, 0, my_comm);
        printf("rank 1: message sent\n");
    }

    sleep(1);
    if (rank == 0) printf("freeing communicator %d\n", my_comm);
    MPI_Comm_free(&my_comm);

    sleep(2);

    MPI_Comm_create(MPI_COMM_WORLD, group, &my_comm);
    if (rank == 0) printf("created communicator %d\n", my_comm);

    if (rank == 0) {
        int msg;
        MPI_Recv(&msg, 1, MPI_INT, 1, 0, my_comm, MPI_STATUS_IGNORE);
        printf("rank 0: message received\n");
    }

    sleep(1);
    if (rank == 0) printf("freeing communicator %d\n", my_comm);
    MPI_Comm_free(&my_comm);

    MPI_Finalize();
    return 0;
}

outputs:

created communicator -2080374784
rank 1: message sent
freeing communicator -2080374784
created communicator -2080374784
rank 0: message received
freeing communicator -2080374784
Community
  • 1
  • 1
i.adri
  • 56
  • 4
  • Interesting - I note this behaves differently in OpenMPI. I'm not sure if this is a bug per se, or the result of undefined behaviour (e.g., it's not clear to me whether it's valid to have free'd the communicator with that Send still obviously in flight). The re-using of the integer representation for the communicator doesn't really mean much one way or another; that's just a handle to an opaque object. Re-use of that is analogous to (I expect, _very_ analogous to) re-using a table entry that's recently been freed. – Jonathan Dursi May 22 '14 at 14:07
  • Unsurprisingly, under IntelMPI and mvapich2, this program correctly hangs at rank 1's send when using Ssend or messages larger than the eager limit. Presumably MPICH2, defensibly, doesn't view the send on my_comm as "pending" when it's been sent eagerly so proceeds and frees my_comm (OMPI hangs at comm_free). But I don't know if the resulting unexpected result is (a) undefined behaviour due to invalid early freeing of the communicator in user code (eg, like modifying the send buffer before you know it's been received), (b) incorrect MPICH2 behaviour, or (c) genuine ambiguity in the standard. – Jonathan Dursi May 22 '14 at 16:30
  • ..my mistake; in OpenMPI it hangs on the receive. – Jonathan Dursi May 22 '14 at 16:54
  • 3
    This is likely a bug in MPICH (on which Intel MPI is based). Can you report this example to discuss@mpich.org? – kraffenetti May 22 '14 at 17:51
  • @kraffenetti - Thanks! I didn't know Intel MPI was based on MPICH; I'll try reporting it there. – i.adri May 22 '14 at 20:49

1 Answers1

0

The number you're seeing is simply a handle for the communicator. It's safe to reuse the handle since you've freed it. As to why you're able to send the message, look at how you're creating the communicator. When you use MPI_Comm_group, you're getting a group containing the ranks associated with the specified communicator. In this case, you get all of the ranks, since you are getting the group for MPI_COMM_WORLD. Then, you are using MPI_Comm_create to create a communicator based on a group of ranks. You are using the same group you just got, which will contain all of the ranks. So your new communicator has all of the ranks from MPI_COMM_WORLD. If you want your communicator to only contain a subset of ranks, you'll need to use a different function (or multiple functions) to make the desired group(s). I'd recommend reading through Chapter 6 of the MPI Standard, it includes all of the functions you'll need. Pick what you need to build the communicator you want.

JamesTullos
  • 242
  • 1
  • 5
  • Agreed about the handle, but I don't think this is right about the communicators. Communicators are _congruent_ if they have the same group and ranks, but they aren't the _same_ communicator (e.g., doing an `MPI_COMM_COMPARE` returns `MPI_CONGRUENT`, not `MPI_IDENT`). So creating a dup of `MPI_COMM_WORLD`, sending to it, and on the receiving end doing an `MPI_Recv()` from `MPI_COMM_WORLD` shouldn't (and doesn't) work. – Jonathan Dursi May 22 '14 at 14:19
  • Creating the new communicator using the same process group as MPI_COMM_WORLD was intentional. This is a simplified repro of the issue I'm facing in a more complex multi-threaded application, where we're using one communicator per thread as a way of ensuring that messages sent from thread `i` on some node will only be received by thread `i` on some other node. – i.adri May 22 '14 at 15:04
  • @i.adri, I'd recommend using MPI tags instead. Lighter weight, and a send/receive pair must have a matching tag. Simply use the thread number for the tag. – JamesTullos May 23 '14 at 14:24
  • 1
    We've created ticket 2096 on the MPICH trac for this issue. http://trac.mpich.org/projects/mpich/ticket/2096 If you'd like to be added to the cc list for progress updates, let me know. – kraffenetti May 23 '14 at 21:33
  • @kraffenetti - Thanks a lot, it looks like you were faster than me in posting it there. Yes, I'd appreciate it if you could add me to the CC list. So far, I see that the answer is that the program is incorrect and therefore the expected behaviour is undefined. However, I still think it would be desirable that such unmatched sends were cancelled when all processes involved call MPI_Comm_free(). I will return with a slightly more elaborate example to explain why I think so. – i.adri May 24 '14 at 09:28
  • @JamesTullos - While I generally agree with your suggestion, as a multi-threaded MPI application becomes more complex, encoding all the necessary addressing information in the 4 bytes provided by MPI tags starts getting out of hand and becomes unfeasible... Isn't that one of the main reasons MPI communicators were introduced in the first place? – i.adri May 24 '14 at 09:40
  • @i.adri - The main purposes I see for additional communicators are to facilitate using collectives on only a subset of processes and to take better advantage of topology. – JamesTullos May 30 '14 at 17:45