For university I'm currently experimenting with the MONITOR/MWAIT instruction pair. Specifically, I want to measure how much energy the CPU uses in different scenarios and have already programmed a relatively well working test setup. As part of the setup, I have all the cores enter MWAIT and then use a NMI to wake them up again after a specified time. So far everything was working fine, but now I wanted to test how the Power Management hints affect the power consumption.
Unfortunately, every hint apart from 0 seems to cause MWAIT to not wait for the NMI, but to wake up on its' own after 3-4 ms. As far as I understand the documentation, the Power Management hints should not have any impact on when execution is continued after the MWAIT, so this is quite strange. And since I still haven't made any progress even after spending a few hours on this problem, I thought maybe someone here has some idea what is going on!
Here is how I use MONITOR/MWAIT in my code:
volatile int dummy;
void do_mwait() {
asm volatile("monitor;" ::"a"(&dummy), "c"(0), "d"(0));
asm volatile("mwait;" ::"a"(0x10), "c"(0));
}
This is obviously just a small excerpt of the Linux kernel module I've written, but should contain all the important points.
dummy
is a variable that is never used outside of what you can see here. It only exists so that I have a valid address to pass to monitor.
do_mwait()
is the function that gets executed on every core available while I do my measurements.
As I said, just exchanging the 0x10
in the second line of do_mwait()
with 0
makes it work the way I expect.
Because the behaviour and supported features of MONITOR/MWAIT depend on the specific CPU model, here are all the relevant (I think) parts of cpuid on my test machine. As far as I see, all necessary features should be supported:
CPU 0:
vendor_id = "GenuineIntel"
version information (1/eax):
processor type = primary processor (0)
family = 0x6 (6)
model = 0xc (12)
stepping id = 0x3 (3)
extended family = 0x0 (0)
extended model = 0x3 (3)
(family synth) = 0x6 (6)
(model synth) = 0x3c (60)
(simple synth) = Intel Core (unknown type) (Haswell C0) {Haswell}, 22nm
...
feature information (1/ecx):
...
MONITOR/MWAIT = true
...
...
MONITOR/MWAIT (5):
smallest monitor-line size (bytes) = 0x40 (64)
largest monitor-line size (bytes) = 0x40 (64)
enum of Monitor-MWAIT exts supported = true
supports intrs as break-event for MWAIT = true
number of C0 sub C-states using MWAIT = 0x0 (0)
number of C1 sub C-states using MWAIT = 0x2 (2)
number of C2 sub C-states using MWAIT = 0x1 (1)
number of C3 sub C-states using MWAIT = 0x2 (2)
number of C4 sub C-states using MWAIT = 0x4 (4)
number of C5 sub C-states using MWAIT = 0x0 (0)
number of C6 sub C-states using MWAIT = 0x0 (0)
number of C7 sub C-states using MWAIT = 0x0 (0)
...
brand = "Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz"
...
I hope this is enough context. Please tell me if I need to share additional information. And thanks in advance for any input, even if it's just an (educated) guess!