1

While trying to run a hybrid MPI/OpenMP application I realized that the number of OpenMP threads was always 1, even though I exported OMP_NUM_THREAS=36. I build a small C++ example showing the issue:

#include <vector>
#include "math.h"

int main ()
{
    int n=4000000,  m=1000;
    double x=0,y=0;
    double s=0;
    std::vector< double > shifts(n,0);


    #pragma omp parallel for reduction(+:x,y)
    for (int j=0; j<n; j++) {

        double r=0.0;
        for (int i=0; i < m; i++){

            double rand_g1 = cos(i/double(m));
            double rand_g2 = sin(i/double(m));

            x += rand_g1;
            y += rand_g2;
            r += sqrt(rand_g1*rand_g1 + rand_g2*rand_g2);
        }
        shifts[j] = r / m;
    }
}

I compile the code using g++:

g++ -fopenmp main.cpp

OMP_NUM_THREADS is still set to 36. When I run the code with just:

time ./a.out

I get a run-time of about 6 seconds and htop shows the command using all 36 cores of my local node, as expected. When I run it with mpirun:

time mpirun -np 1 ./a.out

I get a run-time of 3m20s and htop shows the command is using only on one core. I've also tried using mpirun -np 1 -x OMP_NUM_THREADS=36 ./a.out but results were the same.

I am using GCC 9.2.0 and OpenMPI 4.1.0a1. Since this is a developer version, I've also tried with OpenMPI 4.0.3 with the same result.

Any idea what I am missing?

solalito
  • 1,189
  • 4
  • 19
  • 34
  • Maybe this can add some further insight: https://stackoverflow.com/a/47944979/7678171 – noma Apr 04 '20 at 12:31

1 Answers1

2

The default behavior of Open MPI is to

  • bind a MPI task on a core if there are two or less MPI tasks
  • bind a MPI task to a socket otherwise

So you really should

mpirun --bind-to none -np 1 ./a.out

so your MPI task can access all the cores of your host.

Gilles Gouaillardet
  • 8,193
  • 11
  • 24
  • 30
  • Thanks! That solved the issue. I did not mention this but a few months ago hybrid MPI/OpenMP was working fine without that flag (I think with version 3.x). Is the default to bind MPI task to core a new feature in 4.x? – solalito Apr 03 '20 at 12:21
  • 1
    IIRC, 3.x has the same behavior with respect to binding. – Gilles Gouaillardet Apr 03 '20 at 13:25