2

there are problems with ansys. When I start it, it complains about some partitions. We are using slurm. Does it complain about slurm partitions, in which the jobs run? But RDMA sounds more a hard drive partition. I am a bit confused what the cause of the problem is. Access to the file system or different queues (partitions) in slurm? And how to fix it. Does any one encountered this bug before and maybe know a solution?

It is running on a slurm cluster with an NFS /home an NFS /opt (ansys install) and a BeeGFS /work dir (for models etc).

cfx5remote: Rank 0:35: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY

cfx5remote: Rank 0:35: MPI_Init_thread: pkey table:

cfx5remote: Rank 0:35: MPI_Init_thread: 0x8001

cfx5remote: Rank 0:35: MPI_Init_thread: 0x7fff

cfx5remote: Rank 0:25: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY

cfx5remote: Rank 0:25: MPI_Init_thread: pkey table:

cfx5remote: Rank 0:35: MPI_Init_thread: 0xffff

cfx5remote: Rank 0:25: MPI_Init_thread: 0x8001

cfx5remote: Rank 0:25: MPI_Init_thread: 0x7fff

cfx5remote: Rank 0:25: MPI_Init_thread: 0xffff

cfx5remote: Rank 0:25: MPI_Init_thread: ibv_get_pkey() failed

cfx5remote: Rank 0:21: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY

cfx5remote: Rank 0:25: MPI_Init_thread: Can't initialize RDMA device
Networkguy
  • 51
  • 3

2 Answers2

2

For a tcsh shell:

setenv MPI_IB_PKEY "0xffff"

Forces the application to use the "broadcast" "VLAN". I am not sure why there are more than one partitions to choose from.

For a bash shell:

export MPI_IB_PKEY="0xffff"

Networkguy
  • 51
  • 3
0

cfx5remote: Rank 0:25: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY

cfx5remote: Rank 0:25: MPI_Init_thread: ibv_get_pkey() failed

-> This is infiniband/rmda, very likely totally unrelated to your file systems.

  • We have different OpenMPI Versions and have an ethernet and omnipath link between the nodes. My guess would be, Ansys can't decide if it should take the Ethernet link or the Omnipath link. Do you have any idea how to tell ansys to use Omnipath? – Networkguy Dec 08 '17 at 08:31