4

I wrote a Fortran 95 code and compiled it with gfortran. I used gprof and found the time given by it is significantly less than the CPU time. The gprof tells me the time used by all of the functions is 15.77s. The elapsed CPU time however is 1 min 28 seconds. A glimpse of the profile results is show below:

    Each sample counts as 0.01 seconds.
    %   cumulative   self              self     total           
    time   seconds   seconds    calls   s/call   s/call  name    
    76.67     12.09    12.09        1    12.09    12.82  __bem_mod_MOD_rradwbem
    15.60     14.55     2.46        1     2.46    15.77  MAIN__
    3.36     15.08     0.53      736     0.00     0.00  __bem_mod_MOD_dbesselh_3d
    3.11     15.57     0.49      140     0.00     0.00  __fem_mod_MOD_feasmbl
    0.70     15.68     0.11    30912     0.00     0.00  __bem_mod_MOD_bdrdn
    0.38     15.74     0.06    30915     0.00     0.00  __bem_mod_MOD_bq1n3
    0.13     15.76     0.02        2     0.01     0.01  __bem_mod_MOD_bdrdn_3d
    0.06     15.77     0.01    30912     0.00     0.00  __bem_mod_MOD_dbesselh_1d

The other functions take almost no time. This code includes lots of complex*16 type matrix operations. These operations are mainly included in __bem_mod_MOD_rradwbem.

I did not use I/O operation in the code.

I am not clear why gprof tells me so less time comparing with CPU time? Is there any way I can know where the time is spent on? Is it possible for me to improve the speed of the code to a level of no far from those shown on gprof? Currently I only make the (outer) loop run once and gprof shows the profiling results in this regard. But I need thousands of loops like this in future.

Thanks

  • Was it parallel? What does ’time ./a.out’ command report? – Vladimir F Героям слава Nov 08 '13 at 07:07
  • Try running it again without the call to ` __bem_mod_MOD_rradwbem`. This may show that you have a fixed overhead for running a program which takes a significant fraction of the CPU time but is not accounted for by `gprof`. Program performance graphs, with time the dependent variable, generally don't pass through the origin but have a +ve y-intercept. – High Performance Mark Nov 08 '13 at 09:00
  • 1) `gprof` is 30 years old, and has known problems. 2) To get an idea of how time is spent and what you can save, try [*this method*](http://stackoverflow.com/a/378024/23771). The percentages are rough, but it tells you exactly what's happening. – Mike Dunlavey Nov 08 '13 at 15:42
  • Thanks Vladimir, this code is not parallel. Mark, you are right, I am using ACML which seems to hide the time consumption by the LAPACK functions when using gprof which actually takes a significant time for my specific case. Mike, thanks for the advice, I will try your method. – user2559061 Nov 19 '13 at 08:49

1 Answers1

2

Have you tried to use tcov?

tcov does a line by line profiling, very old fashioned but still valuable in certain environment.

Watch here: http://www.amath.unc.edu/sysadmin/DOC4.0/fortran/prog_guide/8_profiling.doc.html and the manual page.

PS: Sorry this maybe is more something like a comment, but I don't have the necessary 50 rep. and moreover I think is exactly what you were looking for.

Dr.Raghnar
  • 294
  • 3
  • 12