1

I have the following setup, a hybrid MPI/OpenMP code which runs M MPI processes with N threads each. In total there are MxN threads available.

What I would like to do, if possible, is to assign threads only to some MPI processes not to all of them, my code would be more efficient since some of the threads are just doing repetitive work.

Thanks.

armando
  • 1,360
  • 2
  • 13
  • 30
  • 1
    I'm not sure I understand the question. Do you want to keep the `M`x`N` global number of threads unchanged, but having some MPI processes using more than `N` threads, while some other would be using less? Or do you want to achieve something else? – Gilles Nov 07 '15 at 05:51
  • Related? http://stackoverflow.com/questions/11749182/assign-two-mpi-processes-per-core – Alexander Vogt Nov 07 '15 at 09:10

2 Answers2

4

Your question is a generalised version of this one. There are at least three possible solutions.

With most MPI implementations it is possible to start multiple executables with their own environments (contexts) as part of the same MPI job. It is called MPMD (Multiple Programs Multiple Data) or MIMD (Multiple Instructions Multiple Data) model. The syntax usually involves : (colon) as a separator:

$ mpiexec <global parameters>
          -n n1 <local parameters> executable_1 <args1> :
          -n n2 <local parameters> executable_2 <args2> :
          ...
          -n nk <local parameters> executable_k <argsk>

It launches n1 ranks running executable_1 with command-line arguments <args1>, n2 ranks running executable_2 with command-line arguments <args2>, and so on. In total n1 + n2 + ... + nk processes are started and ranks are assigned linearly:

 Ranks (from .. to) |  Executable
====================|=============
0     .. n1-1       | executable_1
n1    .. n1+n2-1    | executable_2
n1+n2 .. n1+n2+n3-1 | executable_3
...                 | ...

As a more narrow case, the same executable could be specified k times in order to get k different contexts with the same executable. <local parameters> could include setting the values of specific environment variables, e.g. in your case that could be OMP_NUM_THREADS. The exact method to specify the environment differs from one implementation to another. With Open MPI, one would do:

mpiexec --hostfile all_hosts \
        -n 5 -x OMP_NUM_THREADS=2 myprog : \
        -n 4 -x OMP_NUM_THREADS=4 myprog : \
        -n 6 -x OMP_NUM_THREADS=1 myprog

That will start 15 MPI ranks on the hosts specified in all_hosts (a global parameters) with the first five using two OpenMP threads, the next four - four OpenMP threads, and the last six running sequentially. With MPICH-based implementations the command would be slightly different:

mpiexec --hostfile all_hosts \
        -n 5 -env OMP_NUM_THREADS 2 myprog : \
        -n 4 -env OMP_NUM_THREADS 4 myprog : \
        -n 6 -env OMP_NUM_THREADS 1 myprog

Although widely supported, the previous method is a bit inflexible. What if one would like e.g. all ranks except every 10-th run sequentially? Then the command line becomes:

mpiexec ...
        -n 9 -x OMP_NUM_THREADS=1 myprog : \
        -n 1 -x OMP_NUM_THREADS=N myprog : \
        -n 9 -x OMP_NUM_THREADS=1 myprog : \
        -n 1 -x OMP_NUM_THREADS=N myprog : \
        ...

A more convenient solution would be to provide a wrapper that sets OMP_NUM_THREADS based on the process rank. For example, such a wrapper for Open MPI looks like:

#!/bin/bash
if [ $((($OMPI_COMM_WORLD_RANK + 1) % 10)) == 0 ]; then
  export OMP_NUM_THREADS=N
else
  export OMP_NUM_THREADS=1
fi
exec "$*"

and is used simply as:

mpiexec -n M ... mywrapper.sh myprog <args>

The third and least flexible option is to simply call omp_set_num_threads() from within the program after MPI initialisation but before any parallel regions and set different number of threads based on the rank:

integer :: provided, rank, ierr

call MPI_INIT_THREAD(MPI_THREAD_FUNNELED, provided, ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)

if (mod(rank, 10) == 0) then
   call omp_set_num_threads(N)
else
   call omp_set_num_threads(1)
end if

No matter what solution is chosen, process and thread binding becomes a bit tricky and should probably be switched off altogether.

Community
  • 1
  • 1
Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186
  • If this is not the answer you are looking for, that rusty crystal orb of mine has failed me one last time... – Hristo Iliev Nov 07 '15 at 10:54
  • Thanks for your explanation. In my code, I set the number of MPI ranks with "mpirun -n", then I use "export OMP_NUM_THREADS" to set the number of threads. However, this method is "inflexible" because the same number of threads is assigned to each MPI rank and I am wasting CPU power because many threads are idle or repeating tasks. That's why I want some scheme to assign for instance 2 thread for MPIs=1,2,3 and 4 threads for MPI=4,5,6. At some point I would need to put a barrier to sync the data from the different MPIs. The method you mentioned can overcome different threads on MPIs issue? – armando Nov 07 '15 at 15:34
  • @armando, using the method(s) outlined above you can set different number of OpenMP threads per MPI rank. How you use MPI in combination with those threads is a separate question. MPI is largely unaware of the existence of threads and all communications happen at a process level, therefore having different number of threads per process should not present a problem. That is, of course, if OpenMP threads do not make MPI calls themselves. – Hristo Iliev Nov 07 '15 at 15:51
  • If I launch several instances of my program (each instance with a different number of ranks), would there be a global communicator for all ranks and a local communicator for the ranks within each instance? – armando Dec 27 '16 at 22:47
1

You can mpirun your job using binding to cores. Then, during the execution, you can call sched_setaffinity in your program (http://man7.org/linux/man-pages/man2/sched_setaffinity.2.html) using ISO_C_BINDINGS because it is a C function.

Anthony Scemama
  • 1,563
  • 12
  • 19