I am evaluating the performance of a busy wait loop for firing events at consistent intervals. I have noticed some odd behavior using the following code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
int timespec_subtract(struct timespec *, struct timespec, struct timespec);
int main(int argc, char *argv[]) {
int iterations = atoi(argv[1])+1;
struct timespec t[2], diff;
for (int i = 0; i < iterations; i++) {
clock_gettime(CLOCK_MONOTONIC, &t[0]);
static volatile int i;
for (i = 0; i < 200000; i++)
;
clock_gettime(CLOCK_MONOTONIC, &t[1]);
timespec_subtract(&diff, t[1], t[0]);
printf("%ld\n", diff.tv_sec * 1000000000 + diff.tv_nsec);
}
}
On the test machine (dual 14-core E5-2683 v3 @ 2.00Ghz, 256GB DDR4), 200k iterations of the for loop is approximately 1ms. Or maybe not:
1030854
1060237
1012797
1011479
1025307
1017299
1011001
1038725
1017361
... (about 700 lines later)
638466
638546
638446
640422
638468
638457
638468
638398
638493
640242
... (about 200 lines later)
606460
607013
606449
608813
606542
606484
606990
606436
606491
606466
... (about 3000 lines later)
404367
404307
404309
404306
404270
404370
404280
404395
404342
406005
When the times shift down the third time, they stay mostly consistent (within about 2 or 3 microseconds), except for occasionally jumping up to about 450us for a few hundred iterations. This behavior is repeatable on similar machines and over many runs.
I understand that busy loops can be optimized out by the compiler, but I don't think that's the issue here. I don't think cache should be affecting it, because no invalidation should be taking place, and wouldn't explain the sudden optimization. I also tried using a register int for the loop counter, with no noticeable effect.
Any thoughts on what is going on, and how to make this (more) consistent?
EDIT: For information, running this program with usleep, nanosleep, or the shown busy wait for 10k iterations all show ~20000 involuntary context switches with time -v
.