6

I am trying to start a ipyparallel cluster using MPI.

The ipcluster_config has following lines modified as such:

c.MPILauncher.mpi_cmd = ['mpiexec']
c.MPIControllerLauncher.controller_args = ['--ip=*']
c.MPILauncher.mpi_args = ["-machinefile", "~/mpi_hosts"]

The ipcontroller_config.py is configured as such:

c.HubFactory.engine_ip = '*'
c.HubFactory.ip = '*'
c.HubFactory.client_ip = '*'

However, when I launch the cluster using command ipcluster start --profile mpi -n 2 it fails with following message

Engines shutdown early, they probably failed to connect.
You can set this by adding "--ip='*'" to your ControllerLauncher.controller_args

Not sure how to debug further.

Kabira K
  • 1,916
  • 2
  • 22
  • 38
  • 1
    Try running `ipcluster start --profile mpi -n 2 --debug` and post the logs from the same – Tarun Lalwani Nov 14 '17 at 14:50
  • Thanks Tarun. This helps. It seems ipcluster is not able to find mpiexec. I need to figure out how to configure ipcluster so it loads the modules. – Kabira K Nov 16 '17 at 17:16
  • Did you install the MPI package? – Tarun Lalwani Nov 16 '17 at 17:17
  • I am on a PBS cluster environment. I have to do module load to see mpiexec in the path. I guess when ipcluster is launching engines on remote nodes, it does not do "module load". I am looking into configs to see if there is any place to specify that. – Kabira K Nov 16 '17 at 17:20

0 Answers0