0

Problem

I'm trying to measure CPU time taken by some functions as my program runs. However, I found out that clock() is giving out funny results when called after nanosleep().

I also found out that using some CPU after nanosleep() clears this effect ~ busy clear.

I'm running Void Linux

$ uname -a
Linux skylittlesystem 5.4.44_1 #1 SMP PREEMPT Wed Jun 3 14:54:55 UTC 2020 x86_64 GNU/Linux

And my glibc is

$ xbps-query glibc | grep pkgver
pkgver: glibc-2.30_1

Below are my clock_nanosleep.c, that demonstrates the issue, and some output I got from it.

Could someone, please, explain what's happening?

Sample program

/*
 * NOTE: build with -O0 or gcc will optimize the busy wait
 *
 * This program is meant to show the impact nanosleep() can have on clock()
 * readings, and that it clears with CPU use (busy clear).
 *
 * Usage:
 *
 * ./clock_nanosleep [row1] [row2] ... [rowN]
 *
 * row may be one of:
 *
 *      '0'     8 readings
 *      '1'     8 readings, with busy_clear() before each
 *      '2'     8 readings, with nanosleep() before each
 *      '3'     8 readings, with nanosleep() and busy_clear() before each
 *
 * example:
 *
 *      $ ./clock_nanosleep 0 1 2 3 0 1 2 3
 *
 */

#include <stdio.h>
#include <time.h>

#define N 8

long t_total, n_runs;

void busy()
{
        int i;
        for (i = 0; i < 100000; ++i);
}

#define L 2048
int x[L] = {0}, y[L] = {0};
void busy_clear()
{
        int i, j;
        for (i = 0; i < L; ++i)
                for (j = 0; j < L; ++j)
                        x[i] = y[j];
}

void run(int sleep, int clear)
{
        clock_t t1, t2;
        long t_run, dt, i;
        struct timespec tv;

        tv.tv_sec = 0;
        tv.tv_nsec = 10000000;

        t_run = 0;
        printf("%7d\t%7d\t", sleep, clear);
        for (i = 0; i < N; ++i)
        {
                if (sleep)
                        nanosleep(&tv, NULL);

                if (clear)
                        busy_clear();

                t1 = clock();
                busy();
                t2 = clock();

                dt = ((t2 - t1) * 1000000) / CLOCKS_PER_SEC;
                t_run += dt;
                printf("%7ld\t", dt);
        }

        printf("%7ld\t%7ld\n", t_run, (t_run / N));

        t_total += t_run;
        ++n_runs;
}

int main(int argc, char* argv[])
{
        int i;

        printf("  sleep\t  clear\t");
        for (i = 0; i < N; ++i)
                printf("     t%d\t", i);
        printf("  total\t    avg\n");

        t_total = 0;
        n_runs = 0;

        for (i = 0; i < argc; ++i)
        {
                switch (argv[i][0])
                {
                        case '0': run(0, 0); break;
                        case '1': run(0, 1); break;
                        case '2': run(1, 0); break;
                        case '3': run(1, 1); break;
                }
        }

        putchar('\n');

        printf("total\t%7ld us\n", t_total);
        printf("average\t%7ld us\n", (t_total / (N * n_runs)));

        return 0;
}

With vs without nanosleep()

If I run it with nanosleep() disabled, I get a spike on the first row, and then it stabilizes:

$ time ./clock_nanosleep 0 0 0 0 0 0 0 0
  sleep   clear      t0      t1      t2      t3      t4      t5      t6      t7   total     avg
      0       0     914     921     936     998     974     985    1013     999    7740     967
      0       0     539     246     242     271     249     220     220     267    2254     281
      0       0     283     237     245     257     242     239     254     268    2025     253
      0       0     279     245     247     265     225     242     248     268    2019     252
      0       0     286     246     260     270     252     220     220     266    2020     252
      0       0     265     250     273     252     251     226     266     250    2033     254
      0       0     274     260     267     249     247     247     265     250    2059     257
      0       0     265     252     266     250     251     248     273     249    2054     256

total     22204 us
average     346 us

real    0m0,026s
user    0m0,024s
sys     0m0,002s

and if I run it with nanosleep() enabled, the readings are higher (more than 2x):

$ time ./clock_nanosleep 2 2 2 2 2 2 2 2
  sleep   clear      t0      t1      t2      t3      t4      t5      t6      t7   total     avg
      1       0     554     781     933     925    1009     988     994     980    7164     895
      1       0    1000    1000     974    1005     984     992    1005    1008    7968     996
      1       0    1006     939     978     909     937     936    1006    1006    7717     964
      1       0     831     893     989     953     946     997    1000     889    7498     937
      1       0     981     926     930     945     987     959     930     993    7651     956
      1       0     807     988     856     946     996     992     693     892    7170     896
      1       0     833     892    1006     907     998     970     972     751    7329     916
      1       0     756     918     931     997     898     999     944     964    7407     925

total     59904 us
average     936 us

real    0m0,715s
user    0m0,064s
sys     0m0,002s

The readings from user and total are not very different (< 10%). I can also see on perf and valgrind that on both cases busy() take around 90% of CPU time.

With vs without busy clear

Repeating the process with nanosleep() but with the busy clear hack:

$ time ./clock_nanosleep 3 3 3 3 3 3 3 3
  sleep   clear      t0      t1      t2      t3      t4      t5      t6      t7   total     avg
      1       1     243     265     233     250     222     251     252     222    1938     242
      1       1     256     250     243     249     250     248     263     249    2008     251
      1       1     266     250     248     223     254     284     267     248    2040     255
      1       1     249     249     252     243     265     270     267     244    2039     254
      1       1     249     305     332     334     248     252     221     252    2193     274
      1       1     258     275     250     251     250     241     221     251    1997     249
      1       1     251     269     252     254     229     236     248     234    1973     246
      1       1     309     221     249     235     222     249     252     259    1996     249

total     16184 us
average     252 us

real    0m1,583s
user    0m0,926s
sys     0m0,006s

and now with just the busy clear, but without nanosleep():

$ time ./clock_nanosleep 1 1 1 1 1 1 1 1
  sleep   clear      t0      t1      t2      t3      t4      t5      t6      t7   total     avg
      0       1     251     250     252     247     223     251     247     227    1948     243
      0       1     251     220     250     252     248     247     248     250    1966     245
      0       1     251     248     251     249     244     222     222     220    1907     238
      0       1     250     267     240     223     249     251     249     249    1978     247
      0       1     249     244     250     246     271     247     248     251    2006     250
      0       1     251     222     248     247     251     251     248     248    1966     245
      0       1     266     267     248     220     220     266     223     247    1957     244
      0       1     233     250     220     222     250     266     260     259    1960     245

total     15688 us
average     245 us

real    0m0,705s
user    0m0,703s
sys     0m0,001s

Note the readings from those two are very similar to the first case, without nanosleep() and without busy clear, except they don't spike on the first row.

All together

Running all cases together yield:

$ time ./clock_nanosleep 0 1 2 3 0 1 2 3
  sleep   clear      t0      t1      t2      t3      t4      t5      t6      t7   total     avg
      0       0     868     966     918     919     476     220     243     260    4870     608
      0       1     265     245     268     222     244     265     251     261    2021     252
      1       0     413     561     814    1011     995    1013     917     912    6636     829
      1       1     249     249     247     250     269     221     362     221    2068     258
      0       0     272     239     253     240     262     258     234     227    1985     248
      0       1     249     250     264     273     222     246     250     241    1995     249
      1       0     353     655     902     954     919    1003     906     903    6595     824
      1       1     249     322     249     249     249     251     251     250    2070     258

total     28240 us
average     441 us

real    0m0,762s
user    0m0,431s
sys     0m0,006s

help

Answer: clock() doesn't scale with processor

As pointed in the comments and the linked answer, my scaling_governor set to powersave was the main culprit.

Because clock() measures CPU time, but not time idle time (like time spent on nanosleep()), I was erroneously assuming that it somehow ticked per CPU instruction executed, and as such scale up and down with the processor.

So for the sake of completeness, here are the CPU details I forgot to add:

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 42
model name      : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
stepping        : 7
microcode       : 0x23
cpu MHz         : 2990.487
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 4983.97
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

... other cores omitted, just ids and MHz varies ...

And another run with scaling_governor set to performance:

$ time ./clock_nanosleep 0 1 2 3 0 1 2 3
  sleep   clear      t0      t1      t2      t3      t4      t5      t6      t7   total     avg
      0       0     248     245     253     240     245     249     255     243    1978     247
      0       1     251     246     256     250     251     251     252     249    2006     250
      1       0     238     310     329     351     312     310     314     279    2443     305
      1       1     249     249     249     249     248     247     249     250    1990     248
      0       0     248     248     252     259     258     245     249     251    2010     251
      0       1     231     252     268     263     270     249     245     241    2019     252
      1       0     319     334     336     340     310     312     308     310    2569     321
      1       1     248     250     259     261     249     245     222     252    1986     248

total     17001 us
average     265 us

real    0m0,686s
user    0m0,359s
sys     0m0,003s

I still consistently observe a ~60 us difference, but I assume this is due to Intel Turbo Boost 2.0 and other magic bits ~ CPU MHz on /proc/cpuinfo fluctuate up and down each time I look.

Thank you very much for the answers! <3

pudiva
  • 274
  • 5
  • 15
  • 2
    As a general principle you should check that you don't have cpu power management interfering and you should probably pin your program to a single core. – Jester Jun 18 '20 at 00:12
  • 1
    FYI, here is what I get on my ryzen 1700x 3800MHz single core running linux kernel 5.0.0 : http://paste.debian.net/1152603/ – Jester Jun 18 '20 at 00:19
  • Looks like `busy_clear()` is enough warm-up to get the CPU back up to max clock speed before your delay loop, if it clocked down during nanosleep. Almost an exact duplicate of [Why does this delay-loop start to run faster after several iterations with no sleep?](https://stackoverflow.com/q/38299023) – Peter Cordes Jun 18 '20 at 00:46
  • CPU frequency scaling is hardware-dependent. You only showed your kernel version, not hardware (CPU model number, e.g. i7 6700k Skylake, or Jester's Ryzen 1700x Zen1) – Peter Cordes Jun 18 '20 at 01:16
  • @PeterCordes and @Jester thank you very much! In fact my assumptions on `clock()` were incorrect. For the sake of completeness, I edited the question with updated results. Thanks lot! <3 – pudiva Jun 18 '20 at 13:31
  • Oh, yes, `clock()` measures *time* not clock-cycles. Note that `CLOCKS_PER_SEC` is an arbitrary compile-time constant so this old legacy API couldn't have been updated for modern systems without breaking things. And yes, it specifically measures user time, not wall-clock, but you're not going to sleep between t1 and t2 anyway so that doesn't matter, at least for a single-threaded process. – Peter Cordes Jun 18 '20 at 13:37

0 Answers0