4

I need to create a loop where I display some information to the screen every 1/30th of a second. I have been using a rather naïve approach where I keep track of the elapsed time during the start and the end of one iteration loop and subtract this value from the duration of the frame. If that time difference is greater than 0, I call std:sleep_for() passing that time difference.

While this works in practice, when I measure the actual time over a certain number of processed cycles (say 120), I notice a significant time difference between the performance of the system and the expected result. Where I should see more or less 4 seconds, I systematically get times over 5 seconds.

Here is some basic code (that simply calls sleep_for 120 times with a duration if 1/30th of a second):

#include <chrono>
#include <iostream>

using namespace std::chrono;

std::chrono::time_point start = std::chrono::high_resolution_clock::now();

int main()
{
    constexpr std::chrono::duration<double, std::micro> one_cycle_duration{1.0 / 30. * 1000000};

    auto global_start = std::chrono::high_resolution_clock::now();

    for (uint32_t i = 0; i < 120; ++i) {
        auto end = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration<double, std::milli>(end - start);
        //std::cerr << i << " " << duration.count() << std::endl;
        auto wait_for = std::chrono::duration<double, std::milli>(one_cycle_duration - duration);
        //std::cerr << i << " " << duration.count() << " " << wait_for.count() << std::endl;
        std::this_thread::sleep_for(wait_for);
        start = std::chrono::high_resolution_clock::now();
    }
    
    auto global_end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> global_elapsed = global_end - global_start;

    std::cerr << global_elapsed.count() << std::endl;

    return 0;
}

Result:

5643.35

I understand that std::sleep_for() doesn't guarantee that it will exactly wait for the passed duration and is somehow imperfect due to synchronization issues and other things taking place in the loop. Still, I wasn't expecting such a significant time difference. I am already using a high-resolution clock.

So my question is/are:

  • do I do something wrong (to start with)?
  • if not, how can one get significantly better results (recommended practice)? E.g., if I run the loop 120 times with 1/30th of a second duration, I should get a result fairly close to 4 seconds, not 4.5, not 5.5 seconds as I get now.

Again, I tried to compensate in the loop by only waiting for the expected cycle duration minus the time it took to process the previous iteration. I was hoping that, over time this would self-correct the issue, but I have yet to get significant improvements (as shown in the code above).

Edit/Solution (an improvement, really):

I am editing the question with the hint provided by @HowardHinnant and @NathanOliver (credit go to them -- see in the comments)

  • get the time when the app starts
  • rather than using sleep_for use sleep_until where the duration passed to the function is global_start + number_of_cycles * time_per_cycle.
  • ps: i have been using steady_clock over high_resolution_clock as mentioned in a post linked in the comment. While the argument for using steady_clock are understandable, it made no difference in practice.
auto global_start = std::chrono::steady_clock::now();

for (uint32_t i = 0; i < 120; ++i) {
    std::this_thread::sleep_until(global_start + i * one_cycle_duration);
}

auto global_end = std::chrono::steady_clock::now();
std::chrono::duration<double, std::milli> global_elapsed = global_end - global_start;
std::cerr << global_elapsed.count() << std::endl;

Result:

3978.73

This is also a basic problem in numerical precision that I should have thought of: you don't want to accumulate an "error" over time. Instead, you should be aiming for the right solution at each iteration. Thanks again to those who pointed me in the right direction.

user18490
  • 3,546
  • 4
  • 33
  • 52
  • 2
    If you want guaranteed timings you need a real time OS. Normal system have so much going on in the background that is basically impossible to get steady and consistent waits. – NathanOliver Feb 21 '23 at 14:01
  • 1
    Also, some advice on `high_resolution_clock`: https://stackoverflow.com/questions/37426832/what-are-the-uses-of-stdchronohigh-resolution-clock – NathanOliver Feb 21 '23 at 14:02
  • 8
    Try `sleep_until` on a `system_clock::time_point`. Get `now()`, and then sleep until now+1frame, then now+2frames, etc. – Howard Hinnant Feb 21 '23 at 14:03
  • @NathanOliver: any pointer on how this can be achieved? If possible at all? – user18490 Feb 21 '23 at 14:04
  • @user18490 As Howard mentioned, `sleep_until` might work better for you. It is still going to vary as sleeping allows for context switching and your thread might not wake back up until time has already passed. – NathanOliver Feb 21 '23 at 14:06
  • @HowardHinnant this is awesome. This works great for me. One ChaptGPT won't get)). Happy to accept this as an aswer if you post it as so (I will edit your answer eventually with the code I came up with). – user18490 Feb 21 '23 at 14:10
  • If you value precision more than CPU time, you can try busy-waiting. Especially given you have short waiting times – Sergey Kolesnik Feb 21 '23 at 14:12
  • 2
    Is this Windows? If so, you'll probably want to use a windows-specific function to adjust the timer resolution. Namely, timeGetDevCaps to get the minimum-allowed timer resolution, timeBeginPeriod to set it, then timeEndPeriod at program exit. See https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod and also https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/ – George Feb 21 '23 at 14:53
  • 1
    What are `start` and `end` supposed to measure? There’s nothing but `++i` between them, since `end` is initialized in the next iteration. – Davis Herring Feb 21 '23 at 16:01
  • @George it's windows now, but ideally, I'd like a cross-platform solution. – user18490 Feb 26 '23 at 14:49

1 Answers1

3

Here's my suggestion with a little more detail:

using frames = std::chrono::duration<std::int64_t, std::ratio<1, 30>>;
auto next_start = std::chrono::steady_clock::now() + frames{0};
for (uint32_t i = 0; i < 120; ++i)
{
    // do work here
    next_start += frames{1};
    std::this_thread::sleep_until(next_start);
}
  • I'm using integral arithmetic instead of floating point to show that it can be done. If one adds frames{0} to clock::now() one gets weird units that you don't have to know the details of. All you have to know is that this is an integral-based solution with no truncation or round-off error.

  • I've made the sleep_until independent of the loop index so that your loop doesn't have to have an index. For example it might be "loop until user hits quit button".

  • On the choice of clocks:

    • Choose steady_clock if you want the sleep_until to never be impacted by a system administrator adjusting the computer's clock to keep it in sync with UTC (this doesn't happen across daylight savings adjustments). Note that no clock is perfect and this choice will slowly drift away from exactly 1/30 of a second. This drift is likely to be on the order of a few seconds per week. I.e. this imperfect clock is steady but not perfectly accurate.

    • Choose system_clock if you want to accept tiny adjustments such that the sleep_until will stay in sync with UTC. This means that no matter how long the loop runs, the sleep_until will always be aimed at an integral number of frames from the starting point according to an independent observer with an accurate UTC clock. I.e. this imperfect clock is not perfectly steady, but is adjusted to maintain accuracy.

Howard Hinnant
  • 206,506
  • 52
  • 449
  • 577