3

I was learning to use gprof and then i got weird results for this code:

int one(int a, int b)
{
    int i, r = 0;
    for (i = 0; i < 1000; i++)
    {
        r += b / (a + 1);
    }
    return r;
}

int two(int a, int b)
{
    int i, r = 0;
    for (i = 0; i < 1000; i++)
    {
        r += b / (a + 1);
    }
    return r;
}

int main()
{
    for (int i = 1; i < 50000; i++)
    {
        one(i, i * 2);
        two(i, i * 2);
    }
    return 0;
}

and this is the profiler output

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 50.67      1.14     1.14    49999    22.80    22.80  two(int, int)
 49.33      2.25     1.11    49999    22.20    22.20  one(int, int)

If i call one then two the result is the inverse, two takes more time than one
both are the same functions, but the first calls always take less time then the second

Why is that?

Note: The assembly code is exactly the same and code is being compiled with no optimizations

arthurprs
  • 4,457
  • 3
  • 26
  • 28
  • 1
    Those are very small differences, and could be due to any number of things. If you're learning to use gprof, you should be aware of its shortcomings: http://stackoverflow.com/questions/1777556/alternatives-to-gprof/1779343#1779343 – Mike Dunlavey Jun 13 '10 at 13:17
  • Good text. Thank you very much :D – arthurprs Jun 13 '10 at 14:57

3 Answers3

1

I'd guess it is some fluke in run-time optimisation - one uses a register and the other doesn't or something minor like that.

The system clock probably runs to a precision of 100nsec. The average call time 30nsec or 25nsec is less than one clock tick. A rounding error of 5% of a clock tick is pretty small. Both times are near enough zero.

Michael J
  • 7,631
  • 2
  • 24
  • 30
  • The assembly code is exactly the same, it's the first think i cheeked. I modified the code to spend more time on the functions and the difference really drooped, but still exists. I think it's caused by some cpu behavior like: branch predic., cache misses, etc.. – arthurprs Jun 12 '10 at 16:19
1

My guess: it is an artifact of the way mcount data gets interpreted. The granularity for mcount (monitor.h) is on the order of a 32 bit longword - 4 bytes on my system. So you would not expect this: I get different reports from prof vs gprof on the EXACT same mon.out file. solaris 9 -

prof 
 %Time Seconds Cumsecs  #Calls   msec/call  Name
  46.4    2.35    2.3559999998      0.0000  .div
  34.8    1.76    4.11120000025      0.0000  _mcount
  10.1    0.51    4.62       1    510.      main
   5.3    0.27    4.8929999999      0.0000  one
   3.4    0.17    5.0629999999      0.0000  two
   0.0    0.00    5.06       1      0.      _fpsetsticky
   0.0    0.00    5.06       1      0.      _exithandle
   0.0    0.00    5.06       1      0.      _profil
   0.0    0.00    5.06      20      0.0     _private_exit, _exit
   0.0    0.00    5.06       1      0.      exit
   0.0    0.00    5.06       4      0.      atexit


gprof
   %  cumulative    self              self    total
 time   seconds   seconds    calls  ms/call  ms/call name
 71.4       0.90     0.90        1   900.00   900.00  key_2_text        <cycle 3> [2]
  5.6       0.97     0.07   106889     0.00     0.00  _findbuf [9]
  4.8       1.03     0.06   209587     0.00     0.00  _findiop [11]
  4.0       1.08     0.05                            __do_global_dtors_aux [12]
  2.4       1.11     0.03                            mem_init [13]
  1.6       1.13     0.02   102678     0.00     0.00  _doprnt [3]
  1.6       1.15     0.02                            one [14]
  1.6       1.17     0.02                            two [15]
  0.8       1.18     0.01   414943     0.00     0.00  realloc   <cycle 3> [16]
  0.8       1.19     0.01   102680     0.00     0.00  _textdomain_u     <cycle 3> [21]
  0.8       1.20     0.01   102677     0.00     0.00  get_mem [17]
  0.8       1.21     0.01                            $1 [18]
  0.8       1.22     0.01                            $2 [19]
  0.8       1.23     0.01                            _alloc_profil_buf [22]
  0.8       1.24     0.01                            _mcount (675)
jim mcnamara
  • 16,005
  • 2
  • 34
  • 51
  • I failed to mention that the point was: the time spent in two() & one() varies in the two reports above. In one the times are equal, the other they are not. – jim mcnamara Jun 12 '10 at 19:19
0

Is it always the first one called that is slightly slower? If that's the case, I would guess it is a CPU cache doing it's thing. or it could be lazy paging by the operating system.

BTW: what optimization flags are compiling with?

Evan Teran
  • 87,561
  • 32
  • 179
  • 238