1

Are there any good debugging and testing mechanisms for MPI programs?

Now the only weapon I have to diagnose an MPI program is using "cout << some string". It surely takes me too much time and it makes little promise of the correctness.

I want to have a test framework for MPI much like JUnit does to Java programs. This may not be possible because of the difference between parallel and sequential paradigms, but the test framework should be able to unit test each sequential module of my code and do integration test to make sure my program works correctly as a whole.

I also want to be able to debug MPI programs.

Btw, i cannot afford any commercial tools. Please give me some advice. Thanks.

Shuo
  • 4,749
  • 9
  • 45
  • 63
  • You can use visual studio code which is free. See my answer here https://stackoverflow.com/a/65106503/2543510 – Sorush Dec 02 '20 at 12:13

4 Answers4

1

As Zulan has already mentioned, there are excellent parallel debuggers out there such as DDT and Totalview.

To profile your application and debug/optimise/visualise the interactions between your MPI tasks, there are tools such as Vampir and Tau.

For a testing framework, I've used standard test frameworks (such as CUnit) in a past MPI project. It does however require a nifty trick to get sensible outputs or else the stdout from different procs get garbled up when the get combined for display. For example, assuming you're launching your jobs locally:

mpiexec -np 4 xterm -e "./your_prog arg1 arg2; read"

That will start each of the mpi task via an xterm session, so the stdout of each task will show up in its own terminal. The trailing read ensures that the terminal remains open after the run has ended. A quick ENTER on each terminal will it once you're down.

The same technique can be used to ran each of your MPI task through standard tools such as gdb, valgrind, etc. Some call this the poor-man's-debugger approach.

Shawn Chin
  • 84,080
  • 19
  • 162
  • 191
1

Correctness tools can integrate well with interactive parallel debugging - Allinea's DDT MPI debugger has a plugin system that lets you run with (say) MARMOT or the Intel Message Checker. When the correctness plugin detects an error, it is displayed back inside DDT, which then lets you investigate what the error really means - whilst your processes are still alive - which improves on the usual postmortem analysis that would be provided if you didn't have a parallel debugger.

In addition to MARMOT and Intel's tool - there are some new projects like MUST (developed by TU Dresden and LLNL), or ISP (University of Utah)

David
  • 756
  • 5
  • 10
0

You can also try SMPI, the Simulated MPI. It's an open-source project (in which I'm involved) aiming at reimplementing the full MPI standard on top of a simulator of distributed systems.

SMPI can run many MPI applications unmodified, and forecast rather accurately the runtime of the application, provided that you have an accurate description of your hardware platform. http://simgrid.gforge.inria.fr/tutorials/simgrid-smpi-101.pdf

You can conduct a formal assessment of your MPI application in this framework (that part is less mature and more ongoing). http://simgrid.gforge.inria.fr/tutorials/simgrid-mc-101.pdf

Martin Quinson
  • 1,347
  • 8
  • 16
0

There are two popular commercial debuggers for MPI: DDT and Totalview. Some good general information is collected in the OpenMPI debugging FAQ.

You might also be interested in MPI correctness checking tools like Marmot. This will help you to find errors in the way your program uses MPI.

Community
  • 1
  • 1
Zulan
  • 21,896
  • 6
  • 49
  • 109