I have written a large Fortran program (using the new standard) and I am currently in the process to try to make it run faster. I have managed to streamline most of the routines using gprof but I have a very large subroutine that organizes the calculation that now take almost 50% of the CPU time. I am sure there are several bottlenecks inside this routine but I have not managed to set any parameters compiling or running the program so I can see where the time is spent inside this routine. I would like at least a simple count how many time each line is calculated or how much CPU time is spent executing each line. Maybe valgrind is a better tool? It was very useful to eliminate memory leaks.
-
1I do not know what you call "the new standard", but you used the [tag:fortran90] tag. Note that 1990 was almost 30 years ago and there were several new standard revisiins after Fortran 90. Please use tag [tag:fortran] for all Fortran questions and add a specific version tag only when necessary to specify - that is not the case here, profiling or debugging tools do not care if it is the old Fortran 90 or 2003 or 2018 and many more people follow the Fortran tag. – Vladimir F Героям слава Oct 17 '19 at 22:29
-
1Tool recommendation is not really on-topic at this site, but I had some good results with Oracle Performance Analyzer - even when used with gfortran or other compilers. Gprof can give some valuable insight, but note really per line., only per function. – Vladimir F Героям слава Oct 17 '19 at 22:32
-
You did not quantify what 50% means. If you're trying to shave a few microseconds off a millisecond, then you're wasting your time. If it is hours or days, then read your documentation about `cpu_time` and sprinkle calls throughout the subroutine recording times for various sections. – evets Oct 18 '19 at 04:32
-
One of the reasons (I think) that SO discourages questions asking for suggestions for tools is that search engines are very good at that. Try searching for the term *profiling fortran code* using your favourite search engine, you'll find much of interest quite quickly. – High Performance Mark Oct 18 '19 at 06:14
-
2If you're on Linux, try the Linux "perf" statistical profiler. See https://perf.wiki.kernel.org/index.php/Main_Page . Basic usage is very simple, basically "perf record ./a.out" and then view the report with "perf report". – janneb Oct 18 '19 at 09:20
-
Searching related questions on profiling I found that there is a utility called gcov in GCC which provides a line by line execution summary of the code. I will try that to see if it gives some indication where the speed of my big subroutine can be improved. – Bo Sundman Nov 14 '19 at 09:11
3 Answers
The gcov
tool in GCC provides a nice overview of an individual subroutine in my code to discover how many times each line is executed. The file with the subroutine to be "covered" must be compiled with
gfortran -c -fprofile-arcs -ftest-coverage -g subr.F90
and to link the program I must add -lgcov
as the LAST library.
After running the program I can use
gcov subr.F90
to create a file subr.F90.gcov
with information of the number of times each line in the subroutine has been executed. That should make it possible to discover bottlenecks in the subroutine. This is a nice complement to gprof
which gives the time in each subroutine but as my program has more than 50000 lines of code it is nice to be able to select just a few subroutines for this "line by line" investigation.

- 7,106
- 3
- 41
- 86

- 424
- 3
- 13
A workaround that I have found is to use cpu_time
module. Although this doesn't automatically do profiling, if you are willing to invest manual efforts, you can call cpu_time
before and after the statement for which you want to profile. The difference of these times gives you the total time needed to execute the statement(s) between the two calls to cpu_time
. If the statement(s) is inside a loop, you can add these differences and print the total time outside the loop.
This is a little oldschool, but I like the OProfile linux toolset.
If you have a fortran program prog
, then running
operf -gl prog
will run prog
and also use kernel profiling to produce a profile and call graph of prog
.
These can then be fed to something like KCachegrind to view them as a nice nested rectangle plot. For converting from operf output to KCachegrind input I use a slightly modified version of this python script.

- 2,871
- 2
- 13
- 26