Which profiler do you use for Fortran code base with MPI in it? gprof doesn't seem to be working correctly. Sun Studio Analyzer only returns the timings for the C/C++ system calls and none of the fortran functions appear.
-
1What is wrong with `gprof`? I use it to profile my MPI programs without problems. Did you compile the objects you want to profile with `-pg`? – milancurcic Aug 12 '13 at 16:08
-
6Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. – Sergey K. Aug 12 '13 at 16:08
-
`gprof` is OK if your call tree is pretty shallow, and it is blind to time spent in I/O, if you have any. I use [*this method*](http://stackoverflow.com/a/378024/23771) which works with GDB in Fortran. I turn off the MPI, do the performance tuning, then turn MPI back on. – Mike Dunlavey Aug 12 '13 at 16:59
-
@IRO-bot: I have used gprof correctly, but either the slave and master logs come out incorrectly, or most of the functions which are sometimes very obviously time-consuming do not show up. So I wanted to see if there are other options. Thanks. – SkypeMeSM Aug 12 '13 at 18:30
-
@SergeyK. Thank you for your point. I am actually hoping to know any other possible profilers/techniques which people use and have found reliable, other than gprof and Sun Studio Suite. I hope I am not asking for recommendations but options. If it is opinionated, I hope I can find out for myself. However, I understand that this type of question does have a chance of getting spam. Please feel free to vote for closure, if you think that is appropriate. Thanks. – SkypeMeSM Aug 12 '13 at 18:48
-
@IRO-bot One more thing to add. I have used gprof with C/C++ and MPI earlier and it has worked fine always. I am just not getting correct results with fortran, and wanted to see if there are other options which you guys might have used. Thanks. – SkypeMeSM Aug 12 '13 at 18:49
-
Some tools which are good for profiling MPI codes are listed [in this community wiki answer](http://stackoverflow.com/questions/10607750/tools-to-measure-mpi-communication-costs/10608276#10608276) – Jonathan Dursi Aug 12 '13 at 19:25
-
@SergeyK. Right, the answers below illustrate your point. ;) – Jan 30 '14 at 04:31
6 Answers
There are a number of performance analysis tools specialized for Parallel/MPI Programs, such as:
- Score-P, which works with a number of different Analysis tools, e.g. Cube, Vampir
- HPCToolkit uses sampling only, so you do not have to recompile your application
- Tau
At first they may not be as simple to use simple to use, but they provide much more help to investigate the performance of parallel applications.

- 21,896
- 6
- 49
- 109
When the questioner says "gprof doesn't seem to be working correctly", perhaps he's referring to the fact that N MPI processes might clobber the gmon.out file. In that case, the (undocumented) GMON_OUT_PREFIX environment variable might make gprof more useful:
$ export GMON_OUT_PREFIX=gmon.out
$ mpiexec -np 4 cpi

- 5,085
- 3
- 27
- 44
Allinea MAP is a profiler that is simple and straightforward but very powerful.
It is designed to show the performance problems in Fortran, C and C++ MPI applications, and requires very little effort to get started and get profiling.
It is graphical, and has an integrated with a source code browser that shows performance against lines of code, and able to analyse bad MPI behaviour, poor work balance or poor vectorization.
I am one of the team behind the product, so am a little biased. It is commercial - there are evaluation licences available from the website.

- 756
- 5
- 10
gprof
is a good profiler for Fortran and other GNU based compilers.

- 73
- 1
- 5
-
@CyrilDuchon-Doris he probably read, and his answer is counterargument to SkypeMeSm question because he not provide any details and explanation "why gprof don't work correctly" – Kamil Kiełczewski Feb 03 '18 at 07:21
You can use Intel Trace analyzer to profile MPI communication and Intel VTune to obtain a profile of single MPI Task. Both software was widely documented on Intel web site.

- 135
- 2
- 9
I would like to add two more profilers : (1) mpiP is a lightweight profiler and can produce textual output but measures only MPI functions. (2) Scalasca - this produces a sophisticated output which can point to synchronisation imbalances (late sender / late receiver) also (as opposed to TAU which does not point to synchronisation imbalances).

- 729
- 1
- 6
- 13