2

I am trying to approximate the function call overhead in C. So I have an empty function with the attribute((optimize("O0"))), so that it is not optimized away by GCC.

int __attribute__((optimize("O0"))) func(int a)
{
    return (a+a);
}

I am using the method described in the paper http://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-execution-paper.html to determine the time, so its pretty accurate.

So I call the function in a loop multiple times and measure the time to execute:

for (i = 0; i < 10; i++)
{
    t1 = start_timer();
    x = func(i);
    t2 = end_timer();

    time = t2 - t1;
}

I notice that the first time the function is called (i=0), it takes more cycles (~10x) than the subsequent calls. Why does this happen?

AbhinavChoudhury
  • 1,167
  • 1
  • 18
  • 38
  • 1
    Caching. The first time you call the function, its instructions won't be in cache. – Petr Skocik Sep 18 '16 at 22:31
  • 2
    If you're compiling with -fpic or -fPIC and the function symbol has default visibility, there will also be a first time call overhead associated with symbol lookup within the dso. – Petr Skocik Sep 18 '16 at 22:36
  • Ulrich Drepper treats both of the issues in excruciating detail in https://www.akkadia.org/drepper/cpumemory.pdf and in http://www.akkadia.org/drepper/dsohowto.pdf. – Petr Skocik Sep 18 '16 at 22:39
  • Can be anything, from late linking to DRAM overheated, thus stalling for a short time (yes, modern systems might do that), retransmission on the bridges, buffering, cache, the weather, cosmic rays, comic rays, ray ban, etc. – too honest for this site Sep 18 '16 at 22:59
  • Why don't you move the `t1` and `t2` timer marks outside the loop, then divide the time by 10? – Weather Vane Sep 18 '16 at 23:11

0 Answers0