count % 1'000'000
is a slow operation, even though the compiler optimizes that to multiplication by an inverse. If you use a power of 2 on the other hand this operation becomes much simpler. For example here is x % n == 0
for 1'000'000
and 1 << 20 == 1'048'576
with int
.
mod_1_000_000(int):
imul edi, edi, 1757569337
add edi, 137408
ror edi, 6
cmp edi, 4294
seta al
ret
mod_1_048_576(int):
and edi, 1048575
setne al
ret
If count
is uint64_t
the difference gets much more pronounced.
An if (count % 1'048'576 == 0)
will be cheap to compute and the branch predictor will only get about 1 miss in a million. So this would be cheap. You can probably make it even better by marking it unlikely so the code for printing console output gets put into a cold path.
Getting the system time and printing every .25 seconds sounds great. But if you are getting the system time inside the loop that will be millions of function calls. Those will be expensive, far more than count % (1 << 20)
.
Unfortunately you can't use alarm
to interrupt the code periodically because you can't print to the console in a signal handler. But you could use multithreading, having one thread do the work and the other print updates and sleep in a loop.
Problem there is how to get the count
from one thread to the other. The compiler has probably optimized that into a register so the other thread reading the memory location where count
is stored won't show the actual count. You would have to make the variable atomic
and that would increase the cost of using it.
Bets bet would be using
if (count % (1 << 20) == 0) atomic_count = count;
and update a shared atomic variable every so often. But is all that overhead of multithreading worth it? You aren't avoiding the if
in the inner loop, just reducing the amount of code executed once in a blue moon.