0

On my local cluster, I could parallelize my OpenMP code across 36 cores using this script

#$ -S /bin/bash
#$ -N name_of_project
#$ -o output.out
#$ -pe orte 36
#$ -V
#$ -cwd

export OMP_NUM_THREADS=36
./my_programme

I could run an OpenMP c++ code across 36 cores with 4 nodes...

However, on a super computing facility that is part of XSEDE:

https://portal.xsede.org/tacc-stampede

I was informed that I could only run OpenMP across 1 node with 16 cores. I am a bit confused, so if I would like parallelize my programme with effectively more than 17 threads, I have to recode my programme into an MPI programme?

I would like to ask how difficult it is to convert a OpenMP programme into an MPI programme? Thank you.

Ronny Brendel
  • 4,777
  • 5
  • 35
  • 55
wasabi123
  • 173
  • 1
  • 13
  • 1
    You should not try to convert OpenMP to MPI. OpenMP is a shared memory programming model. MPI is primarily a distributed memory programming model. Going from shared to distributed data structures requires careful design. – Jeff Hammond Sep 04 '15 at 04:01
  • OpenMPI is an implementation of MPI. It's confusing, but please try to distinguish the two. – Jeff Hammond Sep 04 '15 at 04:02
  • But I will need very huge parallelization on my code (500 threads + is required)... It seems the only viable way is to do the conversion... – wasabi123 Sep 04 '15 at 06:05
  • Sorry, I am trying to say that you cannot evolve OpenMP into good MPI. You need to do a ground up design for MPI. – Jeff Hammond Sep 04 '15 at 06:06
  • ehm.... that is sad.... Because the parallelization is as simple as for loop parallelization .... – wasabi123 Sep 04 '15 at 06:09
  • 2
    You might want talk to someone of the HPC user support staff, and perhaps google for MPI alternatives before getting into this. If you really need to scale beyond 500 threads you might need to learn MPI. MPI is very different from OpenMP. Depending on what you need to do, it can be hard, and take time. – Ronny Brendel Sep 04 '15 at 07:30
  • 1
    When I google "openmp to mpi" some interesting search results pop up, already. Maybe you can find an easy solution if you research the matter a bit more. – Ronny Brendel Sep 04 '15 at 07:40
  • You can implement distributed loop parallelism using Global Arrays or a conceptually equivalent implementation using MPI-3. – Jeff Hammond Sep 05 '15 at 19:40

1 Answers1

2

If I would like parallelize my programme with effectively more than 17 threads, I have to recode my programme into an MPI programme ?

Yes, you would need to write some MPI code in order to exploit the nodes at your disposal. OpenMP targets shared-memory architectures and you need a message passing library in order to communicate between the nodes.

Parallelizing distributed architectures is different (you cannot make a for loop parallelization as in OpenMP) since each node has its own shared-memory and there is no way that one node knows the state of other nodes in order to synchronize the work. You have to do that yourself.

I would like to ask how difficult it is to convert a OpenMP programme into an MPI programme?

The MPI parallelization can be quite straightforward depending on your application and the way you wrote your code. You should detail your algorithms in order to judge that. The big lines are :

  • Embarrasingly parallel problem with static load of work : every MPI node has the same amount of work and do the same job with no or very few interactions with other nodes. If your application enter this category then the parallelization is straighforward and can be done with collective MPI routines. Still, you will need to write and understand how MPI work.
  • More complex parallel problem / dynamic load of work : your problem needs synchronization, some communications between your nodes and/or the amount of work is unknown and you need a load balancing strategy. This is what HPC dudes do for a living :)

I hope you enter in the first category !

In the end, the fun starts here, in order to have a good speedup, you will need to find compromises and play with things since you will have an hybrid OpenMP/MPI parallelization.

coincoin
  • 4,595
  • 3
  • 23
  • 47