Problem
I'm trying to measure CPU time taken by some functions as my program runs. However, I found out that clock()
is giving out funny results when called after nanosleep()
.
I also found out that using some CPU after nanosleep()
clears this effect ~ busy clear.
I'm running Void Linux
$ uname -a
Linux skylittlesystem 5.4.44_1 #1 SMP PREEMPT Wed Jun 3 14:54:55 UTC 2020 x86_64 GNU/Linux
And my glibc is
$ xbps-query glibc | grep pkgver
pkgver: glibc-2.30_1
Below are my clock_nanosleep.c
, that demonstrates the issue, and some output I got from it.
Could someone, please, explain what's happening?
Sample program
/*
* NOTE: build with -O0 or gcc will optimize the busy wait
*
* This program is meant to show the impact nanosleep() can have on clock()
* readings, and that it clears with CPU use (busy clear).
*
* Usage:
*
* ./clock_nanosleep [row1] [row2] ... [rowN]
*
* row may be one of:
*
* '0' 8 readings
* '1' 8 readings, with busy_clear() before each
* '2' 8 readings, with nanosleep() before each
* '3' 8 readings, with nanosleep() and busy_clear() before each
*
* example:
*
* $ ./clock_nanosleep 0 1 2 3 0 1 2 3
*
*/
#include <stdio.h>
#include <time.h>
#define N 8
long t_total, n_runs;
void busy()
{
int i;
for (i = 0; i < 100000; ++i);
}
#define L 2048
int x[L] = {0}, y[L] = {0};
void busy_clear()
{
int i, j;
for (i = 0; i < L; ++i)
for (j = 0; j < L; ++j)
x[i] = y[j];
}
void run(int sleep, int clear)
{
clock_t t1, t2;
long t_run, dt, i;
struct timespec tv;
tv.tv_sec = 0;
tv.tv_nsec = 10000000;
t_run = 0;
printf("%7d\t%7d\t", sleep, clear);
for (i = 0; i < N; ++i)
{
if (sleep)
nanosleep(&tv, NULL);
if (clear)
busy_clear();
t1 = clock();
busy();
t2 = clock();
dt = ((t2 - t1) * 1000000) / CLOCKS_PER_SEC;
t_run += dt;
printf("%7ld\t", dt);
}
printf("%7ld\t%7ld\n", t_run, (t_run / N));
t_total += t_run;
++n_runs;
}
int main(int argc, char* argv[])
{
int i;
printf(" sleep\t clear\t");
for (i = 0; i < N; ++i)
printf(" t%d\t", i);
printf(" total\t avg\n");
t_total = 0;
n_runs = 0;
for (i = 0; i < argc; ++i)
{
switch (argv[i][0])
{
case '0': run(0, 0); break;
case '1': run(0, 1); break;
case '2': run(1, 0); break;
case '3': run(1, 1); break;
}
}
putchar('\n');
printf("total\t%7ld us\n", t_total);
printf("average\t%7ld us\n", (t_total / (N * n_runs)));
return 0;
}
With vs without nanosleep()
If I run it with nanosleep()
disabled, I get a spike on the first row, and then it stabilizes:
$ time ./clock_nanosleep 0 0 0 0 0 0 0 0
sleep clear t0 t1 t2 t3 t4 t5 t6 t7 total avg
0 0 914 921 936 998 974 985 1013 999 7740 967
0 0 539 246 242 271 249 220 220 267 2254 281
0 0 283 237 245 257 242 239 254 268 2025 253
0 0 279 245 247 265 225 242 248 268 2019 252
0 0 286 246 260 270 252 220 220 266 2020 252
0 0 265 250 273 252 251 226 266 250 2033 254
0 0 274 260 267 249 247 247 265 250 2059 257
0 0 265 252 266 250 251 248 273 249 2054 256
total 22204 us
average 346 us
real 0m0,026s
user 0m0,024s
sys 0m0,002s
and if I run it with nanosleep()
enabled, the readings are higher (more than 2x):
$ time ./clock_nanosleep 2 2 2 2 2 2 2 2
sleep clear t0 t1 t2 t3 t4 t5 t6 t7 total avg
1 0 554 781 933 925 1009 988 994 980 7164 895
1 0 1000 1000 974 1005 984 992 1005 1008 7968 996
1 0 1006 939 978 909 937 936 1006 1006 7717 964
1 0 831 893 989 953 946 997 1000 889 7498 937
1 0 981 926 930 945 987 959 930 993 7651 956
1 0 807 988 856 946 996 992 693 892 7170 896
1 0 833 892 1006 907 998 970 972 751 7329 916
1 0 756 918 931 997 898 999 944 964 7407 925
total 59904 us
average 936 us
real 0m0,715s
user 0m0,064s
sys 0m0,002s
The readings from user and total are not very different (< 10%). I can also see on perf
and valgrind
that on both cases busy()
take around 90% of CPU time.
With vs without busy clear
Repeating the process with nanosleep()
but with the busy clear hack:
$ time ./clock_nanosleep 3 3 3 3 3 3 3 3
sleep clear t0 t1 t2 t3 t4 t5 t6 t7 total avg
1 1 243 265 233 250 222 251 252 222 1938 242
1 1 256 250 243 249 250 248 263 249 2008 251
1 1 266 250 248 223 254 284 267 248 2040 255
1 1 249 249 252 243 265 270 267 244 2039 254
1 1 249 305 332 334 248 252 221 252 2193 274
1 1 258 275 250 251 250 241 221 251 1997 249
1 1 251 269 252 254 229 236 248 234 1973 246
1 1 309 221 249 235 222 249 252 259 1996 249
total 16184 us
average 252 us
real 0m1,583s
user 0m0,926s
sys 0m0,006s
and now with just the busy clear, but without nanosleep()
:
$ time ./clock_nanosleep 1 1 1 1 1 1 1 1
sleep clear t0 t1 t2 t3 t4 t5 t6 t7 total avg
0 1 251 250 252 247 223 251 247 227 1948 243
0 1 251 220 250 252 248 247 248 250 1966 245
0 1 251 248 251 249 244 222 222 220 1907 238
0 1 250 267 240 223 249 251 249 249 1978 247
0 1 249 244 250 246 271 247 248 251 2006 250
0 1 251 222 248 247 251 251 248 248 1966 245
0 1 266 267 248 220 220 266 223 247 1957 244
0 1 233 250 220 222 250 266 260 259 1960 245
total 15688 us
average 245 us
real 0m0,705s
user 0m0,703s
sys 0m0,001s
Note the readings from those two are very similar to the first case, without nanosleep()
and without busy clear, except they don't spike on the first row.
All together
Running all cases together yield:
$ time ./clock_nanosleep 0 1 2 3 0 1 2 3
sleep clear t0 t1 t2 t3 t4 t5 t6 t7 total avg
0 0 868 966 918 919 476 220 243 260 4870 608
0 1 265 245 268 222 244 265 251 261 2021 252
1 0 413 561 814 1011 995 1013 917 912 6636 829
1 1 249 249 247 250 269 221 362 221 2068 258
0 0 272 239 253 240 262 258 234 227 1985 248
0 1 249 250 264 273 222 246 250 241 1995 249
1 0 353 655 902 954 919 1003 906 903 6595 824
1 1 249 322 249 249 249 251 251 250 2070 258
total 28240 us
average 441 us
real 0m0,762s
user 0m0,431s
sys 0m0,006s
help
Answer: clock() doesn't scale with processor
As pointed in the comments and the linked answer, my scaling_governor set to powersave was the main culprit.
Because clock()
measures CPU time, but not time idle time (like time spent on nanosleep()
), I was erroneously assuming that it somehow ticked per CPU instruction executed, and as such scale up and down with the processor.
So for the sake of completeness, here are the CPU details I forgot to add:
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
stepping : 7
microcode : 0x23
cpu MHz : 2990.487
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4983.97
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
... other cores omitted, just ids and MHz varies ...
And another run with scaling_governor set to performance:
$ time ./clock_nanosleep 0 1 2 3 0 1 2 3
sleep clear t0 t1 t2 t3 t4 t5 t6 t7 total avg
0 0 248 245 253 240 245 249 255 243 1978 247
0 1 251 246 256 250 251 251 252 249 2006 250
1 0 238 310 329 351 312 310 314 279 2443 305
1 1 249 249 249 249 248 247 249 250 1990 248
0 0 248 248 252 259 258 245 249 251 2010 251
0 1 231 252 268 263 270 249 245 241 2019 252
1 0 319 334 336 340 310 312 308 310 2569 321
1 1 248 250 259 261 249 245 222 252 1986 248
total 17001 us
average 265 us
real 0m0,686s
user 0m0,359s
sys 0m0,003s
I still consistently observe a ~60 us difference, but I assume this is due to Intel Turbo Boost 2.0 and other magic bits ~ CPU MHz on /proc/cpuinfo
fluctuate up and down each time I look.
Thank you very much for the answers! <3