Power preserving SpinWait

Question

I have a polling loop in C# that needs to poll every 100 microseconds on average < EDIT> (given of course that there is no excessive preemptive thread context switch carried out by Windows due to core shortage) < /EDIT>.

As there is no time for a reschedule, Sleep(1) will not do.

So I decided to dedicate a thread (and in practise, a core when setting affinity) and use a Thread.SpinWait for a set of cycles for each iteration. While this works fine, it eats an unnecessary amount of power. The 100 microseconds would be plenty enough for the CPU to pause (while not enough to have the thread temporary removed from the scheduler as the Windows time-slice would be way to long).

Instead, I was thinking of using the Intel PAUSE instruction but I'm not sure it will trigger the Intel CPU to suspend the hardware thread. Intel claims it preserves power and should be used in a spin loop, but as the pause is as long a as 100 microsecond, I really want the core to go into a C1 mode sleep.

Any ideas?

Edit: I'm polling a third party API, so there is no synchronization event to block on.

Why do you have such tight time constraints? Are you interfacing with hardware that are putting these limitations on you, and if so, what hardware is it? — Scott Chamberlain, Aug 07 '12 at 00:48
Hmm, how did you tell the kernel that it can't use that core? This can't work in practice, you *will* lose the processor for 45 msec or more. Except when you test it, then it looks like it works. Write C++/CLI code to get _mm_pause. — Hans Passant, Aug 07 '12 at 01:06
@Hans - Windows is free to use the core, but our software runs one custom scheduler for each logical core on a dedicated machine. The software uses cooperative threading. There are simply to few threads for Windows to schedule to make context switching a problem. We have over a million logical threads, but they are yielding cooperatively so there is not preemptive threads in Windows. Observe that there is no requirement for true real time. It is the average latency per call that matters to us. — Jack Wester, Aug 07 '12 at 01:25
@Hans - The dedicated server would typically run 16 threads (custom schedulers) on a dedicated 24 core machine. Each scheduler runs some C# code until there is blocking I/O or a cooperative yield. This means that the user C# threads are not visible to Windows as they are not OS threads. This means that Windows deals with 16 running threads. It will not bother to schedule anything else on the cores. The 17th thread will enjoy the same undisturbed freedom. And even if Windows did schedule some of its own threads on one of the busy cores, it would not affect average latencies one bit. — Jack Wester, Aug 07 '12 at 02:02
Try using `select` for the shortest possible amount of time on one of the std handles; `fileno( stdout );` or `stdin` or `stderr` and see how Windows behaves. — JimR, Aug 17 '12 at 11:55

valdo · Accepted Answer · 2012-08-07T15:40:39.337

Naturally using synchronization primitives and timers is a preferred way to go to avoid CPU/power-hungry busy wait. However, if you need to poll so frequently - there's no way to achieve this by conventional means, at least in user mode.

One simple thing that you may do is to include a pause CPU instruction within your loop. In MSVC it's implemented by an intrinsic YieldProcessor() method.

Beyond this - probably in kernel-mode programming only. There you may use a high-precision multimedia timer.

Edit:

About SetWaitableTimer. This may be an option. Unlike "traditional" Win32 waiting functions (such as Sleep, WaitForSingleObject and etc.) it uses a high-precision timeout as a parameter.

However user-mode timers are asynchronous in nature. Let's assume the timer becomes active with high precision (thpigh this is not obvious, "traditional" Win32 waiting functions are accurate up to tick quanta, order of tens of milliseconds). After the timer becomes active - it releases the appropriate waiting thread(s). But thread scheduler doesn't have to attach this thread to the execution immediately - it may wait for the next time slice. Or even delay the thread execution more if there are concurrent threads.

In conclusion: the idea seems worth trying. But I won't be surprised if this is more-or-less equivalent to using Sleep.

Agreed. An appropriate synchronization primitives would be appropriate, but I'm polling an external API, so there is nothing available in this case. — Jack Wester, Aug 07 '12 at 13:43
To do a PAUSE, i could use the _mm_pause() intrinsic in emmintrin.h. The problem is that I really want a longer and more efficient pause than PAUSE that really allows Intel to put the core into sleep a longer period of time (100 microseconds). PAUSE does help, but doing it in a spinning loop does not preserve a lot of power compared to deeper sleep that would be possible when talking many microseconds. — Jack Wester, Aug 07 '12 at 13:47
I wonder what NtDelayExecution() in ntdll.dll does. The name sounds intriguing. — Jack Wester, Aug 07 '12 at 13:49
Our internal guru suggested SetWaitableTimer. I'll give it a try. — Jack Wester, Aug 07 '12 at 13:55

Necrolis · Answer 2 · 2012-08-07T14:23:32.727

2

Seeing as your waits are lengthy, you might find SSE3 memory region monitors far more efficient in terms of power savings, however, its not really designed as a blocking spin-wait but as an alertable-wait, but may still prove a viable alternative.

Getting this in C# will require an external DLL however, that provides an interface to the C++ intrinsics (_mm_mwait and _mm_monitor).

edited Aug 07 '12 at 14:23

answered Aug 07 '12 at 14:14

Necrolis

25,836
3
63
101

@JackWester: unfortunately its seems so, MS doesn't mention these restrictions, only the Intel developer manuals. – Necrolis Aug 09 '12 at 06:45

Power preserving SpinWait

2 Answers2