1

I'm using NUMA compute nodes where the network (Mellanox InfiniBand HCA) is connected to the second CPU socket (and NUMA node). Is there any environment variable to simply bind all MPI processes to the second CPU socket with MVAPICH2.2?

The MV2_CPU_BINDING_LEVEL=socket MV2_CPU_BINDING_POLICY=bunch combination does not work since it starts regrouping processes on the first CPU socket.

I usually end up using something like: -genv MV2_CPU_MAPPING 10:11:12:13:14:15:16:17:18:19:30:31:32:33:34:35:36:37:38:39 (use all SMTs of the second 10-core CPU socket) but this is ugly and dependant on the amount of cores.

jyvet
  • 2,021
  • 15
  • 22

1 Answers1

0

This isn't an environment variable, but if you're able to modify /etc/default/grub on your systems, then you can isolate the cores on package 0 from the scheduler. Example for your 10-core (hyper threading) CPUs:

GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT isolcpus=0-19"
Rakurai
  • 974
  • 7
  • 15
  • The idea is interesting but a bit too extreme. Some other people would want to use all available cores with other apps. – jyvet Sep 21 '18 at 08:36
  • I had the exact same problem a year ago, trying to keep MPI on the package with the IB adapter to avoid the QPI link. My team was doing fine grained latency measurements. Unfortunately the only solution I found was the above, despite all my research into shielding. I’ve favorited the question, and I hope you find a better answer. – Rakurai Sep 21 '18 at 10:15