Why does serializing this program (CVE-2018-8120 proof of concept) cause it to work?

Question

When running this proof of concept program for exploiting CVE-2018-8120 by unamer (https://github.com/unamer/CVE-2018-8120/tree/master/x64/Release) on my windows 7 x64 machine with null page protection disabled my machine gives me a 0x50 bugcheck. When I was debugging the code I noticed it miraculously works everytime, so I assumed that "the code is running too fast when I use it without my debugger". I pinpointed when the program allocates the null page and placed a cpuid after it to serialize the program, and now it works flawlessly without crashing my system.

My question is why is this the case? Is it really out of order execution accessing the null page before it's allocated? If so why is this allowed to happen? I'd think this sort of design would produce erroneous results far too often to be allowed to exist.

What is the vulnerability, and what does this test program do? You're right that CPUs don't actually take faults when speculative execution encounters one, not until a faulting instruction reaches retirement so it's known to be non-speculative (not the result of an earlier mis-speculation such as a branch mispredict). And BTW, system calls are serializing, too: CPUs don't rename the privilege level. — Peter Cordes, Dec 21 '18 at 03:05
Unfortunately I might lack the understanding to clearly explain CVE-2018-8120, but I'll try my best. From my understanding there's a function in kernel space that can be made to read from a null pointer, which results in a crash normally. But if you map a fake tagKB structure to the null page it can be used to write memory with kernel privileges, which you can use as an EoP exploit. The test program demonstrates this by running a specified command prompt command with system privileges. — bcvdgfdag fewafdsaf, Dec 21 '18 at 03:25
I don't see why serialization could even possibly matter, unless there's some cross-CPU component to this within the kernel. The cardinal rule of out-of-order execution is that (for a single thread/core) it preserves the illusion of your code running in program order. Unless your kernel has TLB-invalidation bugs, there's nothing you can do that violates this. But you're saying allocating the memory is a separate system call, so that should be done (with correct TLB invalidation if necessary) before triggering the kernel bug this exploit is about. — Peter Cordes, Dec 21 '18 at 03:36
The only changes I made to the program are immediately after the null page call to `NtAllocateVirtualMemory` I overwrite `xor r9d, r9d` and `xor edx, edx` with `jmp 0x13FB61A92` (jmping to some unused int3's) then `push rax` `CPUID` `pop rax` `xor r9d, r9d` and `xor edx, edx` and then `jmp 0x13FB61539` (the address immediately after my first jmp) so regardless of whether or not it should work, it somehow works. Note this was done on a very unpatched (no windows updates since may 2016 except the one that patches the ETERNALBLUE exploit) windows 7 x64 sp1, on an intel 8700k. — bcvdgfdag fewafdsaf, Dec 21 '18 at 03:44
That all sounds totally fine. Executing newly-written instructions always works on current Intel CPUs, even without a serializing instruction or even a `jmp` like the x86 manuals suggest for future-proofing. [Observing stale instruction fetching on x86 with self-modifying code](https://stackoverflow.com/q/17395557). (I thought MS didn't even support Win7 on anything newer than Skylake. IDK if it's possible that there's some other bug interacting. But maybe that's still just in theory, and nothing actually breaks as long as you have drivers that work for the devices in a Coffee Lake system) — Peter Cordes, Dec 21 '18 at 04:01
The `cpuid` instruction changes most of the GP registers and it takes at least `eax` as input. By using it in a rogue-like manner you a tempering some GP register. Since `cpuid` is likely to zeros the GP registers if a random `eax` is given (this is an implementation detail: *reserved* -> 0), this may zeros a variable in a register that will end up in the structure passed to the kernel. In turn, this makes the NULL deferencing possible. This is ** a guess**, I don't think serialization matters here, it's simply out of context. — Margaret Bloom, Dec 21 '18 at 09:00

Why does serializing this program (CVE-2018-8120 proof of concept) cause it to work?

0 Answers0