How is speculative fault due to compiler optimization implemented under the hood?

Question

This question is a follow-up question on Can the C compiler optimizer violate short-circuiting and reorder memory accesses for operands in a logical-AND expression?.

Consider the following code.

if (*p && *q) {
    /* do something */
}

Now as per the discussion at Can the C compiler optimizer violate short-circuiting and reorder memory accesses for operands in a logical-AND expression? (especially David Schwartz comment and answer) it is possible for the optimizer of a standard-conformant C compiler to emit CPU instructions that accesses *q before *p while still maintaining the observable behaviour of the sequence point established with the &&-operator.

Therefore although the optimizer may emit code that accesses *q before *p, it still needs to ensure that any side-effects of *q (such as segmentation fault) is observable only if *p is non-zero. If *p is zero, then a fault due to *q should not be observable, i.e. a speculative fault would occur first due to *q being executed first on the CPU but the speculative fault would be ignored away once *p is executed and found to be 0.

My question: How is this speculative fault implemented under the hood?

I would appreciate if you could throw more light on the following points while answering this question.

As far as I know, when the CPU detects a fault, it generates a trap, that the kernel must handle (either take recovery action such as page swap, or signal the fault such as SIGSEGV to the process). Am I correct?
So if the compiler must emit code to perform speculative fault, it appears to me that the kernel and the compiler (and possibly the CPU too) must all cooperate with each other to implement speculative fault. How does the compiler emit instructions that would tell the kernel or the CPU that a fault generated due to the code should be considered speculative?

If `*q` is accessed *unconditionally* before or after the `if(*p && *q)`, the compiler may be able to conclude that the access cannot fault in a conforming program, and thus be able to reorder the accesses. — EOF, Jun 28 '16 at 11:12

score 3 · Accepted Answer · answered Jun 28 '16 at 10:14

It is implemented as part of the normal speculative fetching process. The result of a speculative fetch, whether it's a numerical result or a fault, is speculative. It is used if, and only if, it is later needed.

As far as I know, when the CPU detects a fault, it generates a trap, that the kernel must handle (either take recovery action such as page swap, or signal the fault such as SIGSEGV to the process). Am I correct?

The result of executing non-speculatively a fetch that produces a fault is a trap. The result of executing a fetch the produces a fault speculatively is a speculative trap that will actually occur only if the result of the speculative fetch is used. If you think about it, speculative fetches would be impossible without this mechanism.

So if the compiler must emit code to perform speculative fault, it appears to me that the kernel and the compiler (and possibly the CPU too) must all cooperate with each other to implement speculative fault. How does the compiler emit instructions that would tell the kernel or the CPU that a fault generated due to the code should be considered speculative?

The compiler does it by placing the fetch for *q after a test on the result of *p. That signals the CPU that the fetch is speculative and that it can only use the results once the result of the test on the result of *p is known.

The CPU can, and does, perform the fetch of *q before it knows whether it needs it or not. This is nearly essential because a fetch can require inter-core operations which are slow -- you wouldn't want to wait any longer than needed. So modern multi-core CPUs implement aggressive speculative fetching.

This is what modern CPUs do. (The answer for CPUs with explicit speculative fetch operations is different.)

`The compiler does it by placing the fetch for *q after a test on the result of *p.` -- But the premise of my question is a scenario where compiler's optimizer emits code to fetch `*q` before fetching `*p`. I mean: My question is not about CPU executing instructions out of order. My question is about how the compiler maintains correct observable behaviour while re-ordering instructions (placing `*q` before `*p`). — Lone Learner, Jun 28 '16 at 10:32
Your premise is nonsense. If there is a detectable difference, then the compiler is broken. — gnasher729, Jun 28 '16 at 12:17
@LoneLearner Yes. The compiler emits code to fetch `*q` before `*p` by emitting the fetch instructions in the other order. The compiler maintains correct observable behavior because the CPU correctly implements speculative fetching. The compiler author understands how the CPU works and emits instructions that make the CPU do whatever it is the CPU does when it receives those instructions. They're not competing, they're 100% cooperating. — David Schwartz, Jun 28 '16 at 16:04

score 2 · Answer 2 · answered Jun 28 '16 at 12:16

In C and C++, you have the "as-if" rule, which means the compiler can do whatever it likes as long as the observable behaviour is what the language promises.

If the compiler generates code for an ancient processor without memory protection, where reading *q will read something (an unspecified value) without any side effects, then clearly it is allowed to read *q, and even exchange the order of the tests. Just as any compiler can swap the operands in (x > 0 || y > 0), provided y has a defined value or reading y with undefined value has no side effect.

But you are asking about speculative execution in the processor. Well, processors do execute instructions after conditional branches before they know whether the conditional branch was taken or not, but they make 100% sure that this doesn't lead to any visible side effects. There is never any code for this, it is all within the CPU. If conditional execution does something that should generate a trap, then the CPU waits until it knows for sure whether the branch was taken or not, and then it either takes the trap or it doesn't. Your code doesn't see it, and even the OS doesn't see it.

"*There is never any code for this, it is all within the CPU*" That's meaningless. If some code makes the CPU do something, then that code is the code for that something it made the CPU do. — David Schwartz, Jun 28 '16 at 16:05

How is speculative fault due to compiler optimization implemented under the hood?

2 Answers2

Linked