4

I have a python script that use mpi4py called main_parallel.py. I can measure the time using time form the cli but, how I can make a profile similar to cProfile? I like to see the number of call for each part of the code. I can't use cProfile because is only for serial code.

Thanks!

NichtJens
  • 1,709
  • 19
  • 27
F.N.B
  • 1,539
  • 6
  • 23
  • 39

2 Answers2

7

As Rob Latham said you could use cProfile. You can save the output from each process in a different file. If you want to profile a function you could use a decorator like this:

from mpi4py import MPI
import cProfile

def profile(filename=None, comm=MPI.COMM_WORLD):
  def prof_decorator(f):
    def wrap_f(*args, **kwargs):
      pr = cProfile.Profile()
      pr.enable()
      result = f(*args, **kwargs)
      pr.disable()

      if filename is None:
        pr.print_stats()
      else:
        filename_r = filename + ".{}".format(comm.rank)
        pr.dump_stats(filename_r)

      return result
    return wrap_f
  return prof_decorator

@profile(filename="profile_out")
def my_function():
  # do something

The output of each process can be visualized using snakeviz

NichtJens
  • 1,709
  • 19
  • 27
hnfl
  • 289
  • 3
  • 7
  • Is there a reason for having `import cProfile` in the inner-most function? This way it gets re-imported every time... – NichtJens Sep 04 '17 at 23:54
  • According to [this](https://stackoverflow.com/a/3095124/6756219) the imports are cached and only imported once. However, there is no reason to keep the import in the inner-most function, so you could import it at top-level. – hnfl Sep 05 '17 at 08:40
  • Yeah, you are right. I forgot that there's no penalty for re-importing. Nevertheless, it's not the usual style ... Also, it might be nice to add the import line for the MPI singleton to make the code complete. EDIT: I proposed an edit to do so. – NichtJens Sep 05 '17 at 18:22
5

Why can't you use cprofile? Have you tried?

For MPICH, I ran like this:

$ mpiexec -l -np 4 python -m cProfile ./simple-io.py doodad 

This gives me 4 sets of output, but the '-l' argument lists the MPI rank in front of each bit of output. Note: that '-l' argument is MPICH specific. OpenMPI uses --tag-output. Other implementations might use something else.

I see cprofile can take a file name argument. make a per-rank output file and then process it with the Stats

% python 
Python 2.7.10 (default, Oct 14 2015, 16:09:02) 
[GCC 5.2.1 20151010] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pstats
>>> pstats.Stats("simple-io.cprofile").sort_stats('cumulative').print_stats()

gives me lots of cprofile information... but my toy program was too tiny to give me anything useful.

Rob Latham
  • 5,085
  • 3
  • 27
  • 44
  • My `mpiexec` (1.10.2 from the Ubuntu 16.04 package manager) doesn't seem to have the `-l` switch. Without it I am getting only one output file, apparently for rank 0. Considering my install is newer than your answer, maybe your answer is outdated? – NichtJens Sep 04 '17 at 23:52
  • Thanks. 1.10.2 comes from OpenMPI. I've updated my answer to reflect that the way to get the rank in front is implementation specific and to document the two main implementations. – Rob Latham Sep 05 '17 at 13:54
  • Great. It's a little bit weird that the apparently standardized `mpiexec` does this in such inconsistent way. I thought its purpose was to replace `mpirun` which suffers exactly from these inconsistencies due to be not standardized ... – NichtJens Sep 05 '17 at 18:13