Sleeping using Windows API CreateWaitableTimer executes differently on Xeon compared to i7 processor

Question

I'm working on a program that needs a consistent ~1ms sleep. The sleep is used in order to generate a hardware pulse of length ~1 ms.

I am using the following code for the sleep

void usleep(__int64 usec)
{
  HANDLE timer;
  LARGE_INTEGER ft;
  ft.QuadPart = -(10*usec); // Convert to 100 nanosecond interval, negative value indicates relative time

  timer = CreateWaitableTimer(NULL, TRUE, NULL);
  SetWaitableTimer(timer, &ft, 0, NULL, NULL, 0);
  WaitForSingleObject(timer, INFINITE);
  CloseHandle(timer);
}

taken from here

When I use the above code (using Embarcaderos bcc32 compiler) on a Intel i7, passing 1000 (1ms) I get a sleep that I measure using Poco's timestamp function to about 1ms. The code itself is executed in a thread.

The code looks like this:

    mDebugFile <<  std::setprecision (17) << mPulseEventTime.elapsed()/1000.0 << "\t" << 0 << "\n";
    setHigh(false);
    mDebugFile <<  std::setprecision (17) << mPulseEventTime.elapsed()/1000.0 << "\t" << 1 << "\n";       
    usleep(1000);
    mDebugFile <<  std::setprecision (17) << mPulseEventTime.elapsed()/1000.0 << "\t" << 1 << "\n";
    setLow(false);
    mDebugFile <<  std::setprecision (17) << mPulseEventTime.elapsed()/1000.0 << "\t" << 0 << "\n";

where the mDebugFile is a fstream object and setLow/setHigh are calls to the hardware.

However, when the same code is executed on a Xeon CPU, the sleep is measured to about 10 ms. Assuming that Poco's timing function gives the proper time, 10 ms is quite large, when asking for 1ms.

Is there any other way to get a reliable sleep for ~ 1ms? Can the Windows OS be modified to give a more reliable sleep?

I don't have access to boost or above C++11 features.

The code includes the time spent formatting and logging to the debug stream, not just the sleep time. — dxiv, Jun 09 '20 at 02:22
@dxiv Actually, the output shows that the debug output happens faster than a usec, as the time going from low to high, or vice versa is "0" — Totte Karlsson, Jun 09 '20 at 02:55
10 msec is more "normal" than 1 msec, depends on what browser you got running. Chrome is notorious. Real normal is 15.625 msec. You need to jack up the clock interrupt rate with [timeBeginPeriod(1)](https://learn.microsoft.com/en-us/windows/win32/api/timeapi/nf-timeapi-timebeginperiod). Bad practice, but ought not matter much on power-hungry Xeon hardware. — Hans Passant, Jun 11 '20 at 20:36

score 5 · Accepted Answer · answered Jun 11 '20 at 17:23

Windows is not a real-time operating system, so getting super precise sleep times in user code will never be guaranteed. There are a bunch of things in play.

The OS may shift expiration times to save power. This is called timer coalescing. Waking up once to handle a few events is more power efficient than waking more often at higher frequencies. It wouldn't be surprising if the OS uses different strategies for different types of hardware.
WaitForSingleObject (and related APIs) are still subject to the scheduler. When the object becomes signaled, the waiting thread becomes available for scheduling, but that doesn't mean it will be scheduled immediately. It depends on process and thread priorities, cores available, the system's quanta interval, the phase of the moon, etc.

There are two APIs to increase the default timer resolution on Windows: NtSetTimerResolution (officially undocumented) or timeBeginPeriod. You can get pretty close to 1 ms intervals in user code using basic methods (e.g., Sleep), if the machine isn't overloaded.

But lots of programs waste energy abusing these APIs. (If you're seeing 1 ms resolution on a machine, it's likely because some program has already boosted the timer resolution.) This has long been considered bad practice. The timer resolution is a system-wide setting, so it affects everything running on the machine. If you must increase the resolution, please do so only when needed and be sure to restore the default as soon as you're done.

If you need even more precision, I believe you need to make a kernel-mode driver, but I don't have any experience with that. At some point, Windows added an audio stack that guarantees latencies of 20 us or less. If I recall correctly, that required kernel mode shenanigans.

You might consider using a microcontroller (e.g., an Arduino) to generate your hardware signal and have your Windows program send commands to it via a serial interface.

Sleeping using Windows API CreateWaitableTimer executes differently on Xeon compared to i7 processor

1 Answers1