0

I have a single threaded program that does some operations on a large file (~16GB) in a loop, and has a variable count that increments on each loop, I want to be able to see what the count is at by outputting to console every so often, but I dont want it to significantly slow down my program, so I wanted to know if it would be faster to use modulo on count, and output if it equals 0 (if (count%1_000_000==0) {std::cout << count << endl;}), or if I should use system time, and if more than .25 of a second has passed, print to console, or just print to console every time it loops through.

the loop will run a few billion times aprox.

Reed Thorngag
  • 105
  • 1
  • 5
  • 2
    Note that if you are running windows, the console simply is fairly slow and there's not a whole lot to be done about it. – SoronelHaetir Jul 13 '22 at 00:28
  • If you replace the `endl` with a newline, `'\n'`, the console may not update as often, the data is written to a buffer that will update the console when the buffer fills up, and the program will be a LOT faster. Details: ["std::endl" vs "\n"](https://stackoverflow.com/questions/213907/stdendl-vs-n) – user4581301 Jul 13 '22 at 00:41

1 Answers1

1

count % 1'000'000 is a slow operation, even though the compiler optimizes that to multiplication by an inverse. If you use a power of 2 on the other hand this operation becomes much simpler. For example here is x % n == 0 for 1'000'000 and 1 << 20 == 1'048'576 with int.

mod_1_000_000(int):
        imul    edi, edi, 1757569337
        add     edi, 137408
        ror     edi, 6
        cmp     edi, 4294
        seta    al
        ret

mod_1_048_576(int):
        and     edi, 1048575
        setne   al
        ret

If count is uint64_t the difference gets much more pronounced.

An if (count % 1'048'576 == 0) will be cheap to compute and the branch predictor will only get about 1 miss in a million. So this would be cheap. You can probably make it even better by marking it unlikely so the code for printing console output gets put into a cold path.


Getting the system time and printing every .25 seconds sounds great. But if you are getting the system time inside the loop that will be millions of function calls. Those will be expensive, far more than count % (1 << 20).

Unfortunately you can't use alarm to interrupt the code periodically because you can't print to the console in a signal handler. But you could use multithreading, having one thread do the work and the other print updates and sleep in a loop.

Problem there is how to get the count from one thread to the other. The compiler has probably optimized that into a register so the other thread reading the memory location where count is stored won't show the actual count. You would have to make the variable atomic and that would increase the cost of using it.

Bets bet would be using

if (count % (1 << 20) == 0) atomic_count = count;

and update a shared atomic variable every so often. But is all that overhead of multithreading worth it? You aren't avoiding the if in the inner loop, just reducing the amount of code executed once in a blue moon.

Goswin von Brederlow
  • 11,875
  • 2
  • 24
  • 42