There are basically three ways that this can be done. I'll go over them one at a time.
One way to do it is basically what you describe in the question: do the comparison, and then branch to code that separately implements both possibilities. For example:
cmp [a], 13 ; compare 'a' to 13, setting flags like subtraction
jge GreaterThanOrEqual ; jump if 'a' >= 13, otherwise fall through
mov eax, [x * 0 * sizeof(x[0])] ; EAX = x[0]
jmp Next ; EAX now loaded with value, so do unconditional jump
GreaterThanOrEqual:
mov eax, [x * 1 * sizeof(x[0])] ; EAX = x[1]
; EAX now loaded with value; fall through
Next:
mov [y], eax ; store value of EAX in 'y'
Normally, a compiler would try and keep more values in registers, but this should give you an idea of the basic logic. It does the comparison, and either branches to an instruction that reads/loads x[1]
or falls through to an instruction that reads/loads x[0]
. Then, it moves onto an instruction that stores that value into y
.
You should be able to see that this is relatively inefficient because of all the branching that's required. As such, optimizing compilers won't generate code like this, especially in the simple case where you have a basic ternary expression:
(a >= 13) ? 1 : 0
or even:
(a >= 13) ? 125 : -8
There are bit-twiddling tricks that can be used to do this comparison and get the corresponding integer without ever having to do a branch.
This brings us to the second way of doing it, which is to use the SETcc
instruction. The cc
part stands for "condition code", and all of the condition codes are the same as those for the conditional-jump instructions. (In fact, you could write all of the conditional-jump instructions as Jcc
.) For example, jge
means "jump if greater-than-or-equal"; similarly, setge
means "set if greater-than-or-equal". Simple.
The trick about SETcc
is that it sets a BYTE-sized register, which basically means AL
, CL
, DL
, or BL
(there are more options; you could set the high byte of one of those registers, and/or in 64-bit long mode there are more options, but these are the basic choices for operands).
Here is an example of the code implementing this strategy:
xor edx, edx ; clear EDX
cmp [a], 13 ; compare 'a' to 13, setting flags like subtraction
setge dl ; set DL to 1 if greater-than-or-equal, or 0 otherwise
mov eax, [x * edx * sizeof(x[0])]
mov [y], eax
Cool, right? Branch eliminated. The required 0 or 1 is loaded directly into DL
, and then this is used as part of the load (MOV
instruction).
The only thing slightly confusing here is that you need to know that DL
is the low byte of the full 32-bit EDX
register. This is why we needed to pre-clear the full EDX
, since setge dl
only affected the low byte, but we want the full EDX
to be either 0 or 1. It turns out that pre-zeroing the full register is the most optimal way of doing this on all processors, but there are other ways, like using MOVZX
after the SETcc
instruction. The linked answer goes into great detail about that, so I won't belabor it here. The key point is just that SETcc
only sets the low byte of a register, but subsequent instructions require the entire 32-bit register to have the value, so you need to eliminate garbage in the upper bytes.
Anyway, this is the code that compilers will generate 99% of the time when you write something like y = x[a >= 13]
. The SETcc
instruction gives you a way to set a byte according to the status of one or more flags, just like you could branch on the flags. This would basically be what you were thinking of an instruction that allows accessing the flags directly.
This implements the logic for
(a >= 13) ? 1 : 0
but what if you wanted to do
(a >= 13) ? 125 : -8
like I mentioned earlier? Well, you still use the SETcc
instruction, but you do a little bit of fancy bit-twiddling afterwards to "fix up" the resulting 0 or 1 to the values you actually want. For example:
xor edx, edx
cmp [a], 13
setge dl
dec edx
and dl, 123
add edx, 125
; do whatever with EDX
This works for just about any binary choice (two possible values, depending on a condition), and optimizing compilers are smart enough to work this out. Still branchless code; very cool.
There is a third way that this could be implemented, but it is conceptually very similar to the second way that we've just been talking about. It uses a conditional move instruction, which is just another way of doing a branchless set based on the status of flags. The conditional move instruction is CMOVcc
, where cc
again refers to "condition code", exactly as it has in the previous examples. The CMOVcc
instruction was introduced with the Pentium Pro circa 1995, and has been in all processors since (well, not the Pentium MMX, but Pentium II and later), so essentially everything that you would ever see today.
The code is very similar, except that it's—as the name suggests—a conditional move, so a bit more preliminary setup is required. Specifically, you need to get the candidate values loaded into registers, so that you can select the right one:
xor edx, edx ; EDX = 0
mov eax, 1 ; EAX = 1
cmp [a], 13 ; compare 'a' to 13 and set flags
cmovge edx, eax ; EDX = (a >= 13 ? EAX : EDX)
mov eax, [x * edx * sizeof(x[0])]
mov [y], eax
Notice that the move of EAX
into EDX
is conditional—it happens only if the flags indicate the condition ge
(greater-than-or-equal-to). So, it works out to the basic C ternary operation, as noted in the comment to the right of the instruction. If the flags indicate ge
, then EAX
is moved into EDX
. Otherwise, nothing is moved, and EDX
keeps its original value.
Note that, although certain compilers (Intel's compiler specifically, known as ICC) will prefer CMOV
instructions over SET
instructions, this has no advantage over the previous implementation that we saw earlier with SETGE
. In fact, it is really sub-optimal.
When CMOV
really comes into its own is to allow you to eliminate that bit-twiddling code required to get values other than the good old 0 or 1. For example:
mov edx, -8 ; EDX = -8
mov eax, 125 ; EAX = 125
cmp [a], 13 ; compare 'a' to 13 and set flags
cmovge edx, eax ; EDX = (a >= 13 ? EAX : EDX)
; do whatever with EDX
This is now fewer instructions, because the correct value was moved directly into the EDX
register, as opposed to setting either 0 or 1 and then having to manipulate it into the values we want. Compilers will therefore use CMOV
instructions (when targeting processors that support them, as already mentioned) to implement more complex logic like
(a >= 13) ? 125 : -8
even though they could do them using one of the other approaches. You also need conditional moves when the operands on either side of the condition are not compile-time constants (i.e., they are values in registers, known only at run-time).
Does that help? :-)
I've tried to look at what gcc spits out for this code example but it's impossible to pick out the logic from within all of the extra junk it throws in.
Yeah. I have a couple of hints for you:
Pare down your code to a very simple function that only does what you want to study. You'll want to take an input as a parameter (so that the optimizer can't trivially fold a constant), and you'll want to return the output from the function. For example:
int Foo(int a)
{
return a >= 13;
}
Returning a bool
would have also worked here. If you'd used a conditional operator to return something other than 0 or 1, you'd need to return an int
, of course.
Either way, now you can see exactly what assembly instructions the compiler is generating to implement this, without any other noise. Make sure that you've enabled optimizations; looking at debugging code is not instructive and very noisy.
Make sure that you are asking GCC to generate assembly listings using the Intel/MASM format, which is much easier to read (in my opinion, at least) than its default format, the GAS/AT&T syntax. All of my assembly code examples above are written using Intel syntax. The required incantation would be:
gcc -S -masm=intel MyFile.c
where -S
generates an assembly listing for the input source code file, and -masm=intel
switches the assembly-listing syntax format to Intel style.
Use a nice tool like the Godbolt Compiler Explorer, which automates all of this to substantially decrease the turn-around time. As another bonus, it color-codes the assembly instructions to match with the lines of C code in the original source.
Here is an example of what you'd use to study this. The original source is on the far left. The middle pane shows GCC 7.1's assembly output for a modern processor, which supports CMOV
instructions. The far right pane shows GCC 7.1's assembly output for a very old processor that does not support CMOV
instructions. Cool, right? And you can easily manipulate the compiler switches and watch how the output changes. For example, if you do -m64
(64-bit) instead of -m32
(32-bit), then you'll see that the parameter is passed in a register (EDI
), rather than being passed on the stack and having to be loaded into a register as the very first instruction in the function.