0

I'm trying to create a master slave configuration of processes between two 2 nodes.

Node1 spawns N processes at Node2. My problem is that when the spawned processes try to communicate with their parent node. They try to connect to 127.0.1.1 IP which is the IP assigned to Node1 on /etc/hosts file of Node1.

My /etc/hosts files are like this

Node1 /etc/hosts file

127.0.0.1  localhost
127.0.1.1  node1
ip.node.2  node2
...

Node2 /etc/hosts file

127.0.0.1  localhost
127.0.1.1  node2
ip.node.1  node1
...

This is my error

MPIR_Init_thread(506)............................: 
MPID_Init(325)...................................: spawned process group was unable to connect back to the parent on port <tag#0$description#madx$port#60313$ifname#127.0.1.1$>
MPID_Comm_connect(191)...........................: 
MPIDI_Comm_connect(834)..........................: Named port tag#0$description#madx$port#60313$ifname#127.0.1.1$ does not exist
MPIDI_Comm_connect(651)..........................: 
MPIDI_Create_inter_root_communicator_connect(324): Connection timed out in 180 seconds

And my master.c code

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
void main(int argc, char *argv[])
{
    MPI_Init(&argc, &argv);
    int kernels, servers;
    char hostname[256];
    gethostname(hostname, 255);
    //char nombre[10]; int longitud;
    kernels = atoi(argv[1]);
    servers = atoi(argv[2]);

    MPI_Comm intercomm;
    MPI_Info info[2];

    MPI_Info_create(&info[0]);
    MPI_Info_set(info[0], "hostfile", "host2.txt");
    MPI_Info_create(&info[1]);
    MPI_Info_set(info[1], "hostfile", "host2.txt");

    char *cmds[2] = {"./kernel", "./server"};
    int np[2] = {kernels, servers};
    int errcodes[2];
    MPI_Comm_spawn_multiple(2, cmds, MPI_ARGVS_NULL, np, info, 0, MPI_COMM_WORLD, &intercomm, errcodes);

    MPI_Finalize();
}

host2.txt

host2:4
Marc G.G
  • 15
  • 2

0 Answers0