I'm trying to run a multilocale Chapel code on a cluster that has an MXM Infiniband network(40 Gbps, model: Mellanox Technologies MT26428).
I followed both Chapel and GASNet documentations, and I set
export CHPL_COMM_SUBSTRATE=ibv
export CHPL_LAUNCHER=gasnetrun_ibv
export GASNET_IBV_SPAWNER=mpi
instead of using CHPL_COMM_SUBSTRATE=mxm
, once mxm is deprecated.
The problem is that I can build Chapel using the ibv substrate. But, I cannot run on multiple locales. I receive a huge number of timeout errors.
In the first place, I thought the problem was the PKEY. So, I added "--mca btl_openib_pkey "0x8100""
to the MPIRUN_CMD
. But, no success.
I also tried to use the deprecated mxm configuration:
CHPL_LAUNCHER=gasnetrun_mxm
export CHPL_LAUNCHER=gasnetrun_ibv
export GASNET_MXM_SPAWNER=mpi
However, I cannot build Chapel with such a configuration. That's the error message:
"User requested --enable-mxm, but I don't know how to build mxm programs for your system."
By the way, using GASNET on top of MPI, UDP, and Infiniband without a Partition Key works just fine.
Does anybody know how to use Chapel on a Cluster equipped with an MXM Infiniband network and Partition Key (PKEY)?
Best Regards,
Tiago Carneiro.