I'm trying to launch my mpi-application (Open MPI 1.4.5) with numactl. Since apparently the load balancing using --cpu-nodebind doesn't distribute my processes in a round-robbin manner among the available nodes I wanted to specifically restrict my processes to a closed set of cpus. In this way I plan to ensure a balanced load between the nodes in terms of the number of threads running on each node. --physcpubind seems to do the job according to the numactl manual.
The problem is - from what I could extract from this post - that, using --phycpubind, processes are allowed to migrate inside this cpu-set. Another problem is, that some cpus from this set remain unused while others are being assigned two or more processes and thus running with only 50% or less CPU usage. Why is this happening and is there any workaround for this phenomenon?
Kind regards