Variable performance of busy wait loop?

Question

I am evaluating the performance of a busy wait loop for firing events at consistent intervals. I have noticed some odd behavior using the following code:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>

int timespec_subtract(struct timespec *, struct timespec, struct timespec);

int main(int argc, char *argv[]) {
    int iterations = atoi(argv[1])+1;

    struct timespec t[2], diff;

    for (int i = 0; i < iterations; i++) {
        clock_gettime(CLOCK_MONOTONIC, &t[0]);

        static volatile int i;
        for (i = 0; i < 200000; i++)
            ;

        clock_gettime(CLOCK_MONOTONIC, &t[1]);

        timespec_subtract(&diff, t[1], t[0]);
        printf("%ld\n", diff.tv_sec * 1000000000 + diff.tv_nsec);
    }
}

On the test machine (dual 14-core E5-2683 v3 @ 2.00Ghz, 256GB DDR4), 200k iterations of the for loop is approximately 1ms. Or maybe not:

1030854
1060237
1012797
1011479
1025307
1017299
1011001
1038725
1017361
... (about 700 lines later)
638466
638546
638446
640422
638468
638457
638468
638398
638493
640242
... (about 200 lines later)
606460
607013
606449
608813
606542
606484
606990
606436
606491
606466
... (about 3000 lines later)
404367
404307
404309
404306
404270
404370
404280
404395
404342
406005

When the times shift down the third time, they stay mostly consistent (within about 2 or 3 microseconds), except for occasionally jumping up to about 450us for a few hundred iterations. This behavior is repeatable on similar machines and over many runs.

I understand that busy loops can be optimized out by the compiler, but I don't think that's the issue here. I don't think cache should be affecting it, because no invalidation should be taking place, and wouldn't explain the sudden optimization. I also tried using a register int for the loop counter, with no noticeable effect.

Any thoughts on what is going on, and how to make this (more) consistent?

EDIT: For information, running this program with usleep, nanosleep, or the shown busy wait for 10k iterations all show ~20000 involuntary context switches with time -v.

Sorry, but your approach is completely wrong. You cannot get a truely reliable timing on aPC-alike system that way. It is certainly an XY-problem. Please state what you **actually** want to accomplish and all relevant details. — too honest for this site, Jul 07 '16 at 18:58
What I actually want to accomplish is to understand why the performance of the busy loop changes, as I stated in the question title. I'm aware of alternative methods of timing my program. — Rakurai, Jul 07 '16 at 19:18
@Mysticial - context switches away from the process would explain a large number occasionally, not a consistent change in how long it takes to perform 200k iterations through a busy loop. — Rakurai, Jul 07 '16 at 19:19
Among other things, clock speed can vary. Don't know if that is what you are seeing or not. — Shannon Severance, Jul 07 '16 at 19:46
That's an interesting idea, I had forgotten that the cores can be clocked down when idle. I wonder if there's a convenient way to tell if that's going on, and possibly avoid it? — Rakurai, Jul 07 '16 at 19:49
I *guess* what you're seeing is the scheduler (or, rather: load balancer) when it realizes that you are hogging one core and load over all cores becomes severely impalanced, pulls away processes from your core onto other, less-loaded cores. The load balancer runs a a much lower frequency than the scheduler. The steps you see then is processes pulled off from your core and no longer sharing that core with your process. — tofro, Jul 07 '16 at 21:52
@tofro That makes sense, except it's a 28 core machine that is idle other than my tasks. Maybe I'll try masking the kernel processes off a particular core and try running it there. — Rakurai, Jul 14 '16 at 01:38

dbush · Answer 1 · 2016-07-12T18:16:49.643

One big issue with busy waiting is that, besides using up CPU resources, the amount of time you wait will be highly dependent on the CPU block speed. So the same loop can run for wildly different times on different machines.

The problem with any method of sleeping is that due to OS scheduling you may end up sleeping for longer than intended. The man pages for nanosleep says that it will use the rem argument to tell you the remaining time in case you received a signal, but it says nothing about waiting too long.

You need to grab the timestamp after each call to usleep so you know how long you actually slept for. If you slept too short, you add the deficit. If you slept too long, you subtract the overage.

Here's an example of how I did this in UFTP, a multicast file transfer application, in order to send packets at a consistent speed:

int64_t diff_usec(struct timeval t2, struct timeval t1)
{
    return (t2.tv_usec - t1.tv_usec) +
            (int64_t)1000000 * (t2.tv_sec - t1.tv_sec);
}

...

        int32_t packet_wait = 10000;
        int64_t overage = 0, tdiff;
        struct timeval current_sent, last_sent;

        gettimeofday(&last_sent, NULL);

        while(...) {
            ...

            if (packet_wait > overage) {
                usleep(packet_wait - (int32_t)overage);
            }
            gettimeofday(&current_sent, NULL);
            tdiff = diff_usec(current_sent, last_sent);
            overage += tdiff - packet_wait;

            last_sent = current_sent;
            ...
        }

Thank you for your comment, it is useful information and a correct way to control timing in case the sleep was too long. However, it's not what I'm asking. I'm trying to figure out why the performance of the busy loop changes in steps as the program progresses. — Rakurai, Jul 07 '16 at 19:21
@Rakurai The reason for this is entirely dependent on the OS. Busy wait loops are highly unpredictable because of this. Unless you're willing to dig deep into the source code of the OS scheduler (assuming you're on Linux or some other open source OS), this is not a path you want to go down. — dbush, Jul 07 '16 at 19:27

score 1 · Answer 2 · answered Jul 07 '16 at 19:39

I'd make 2 points - Due to context swtiching sleep/usleep may sleep for more time than expected - Moreover if there is some higher priority task like interrupts, there may come a situation when sleep may not be executed at all.

Thus if you want exact delay in your application you can use gettimeofday to calculate the time gap which can be subtracted from the delay in sleep/usleep call

Variable performance of busy wait loop?

2 Answers2