5

This question is more out of curiousity than necessity:

Is it possible to rewrite the c code if ( !boolvar ) { ... in a way so it is compiled to 1 cpu instruction?

I've tried thinking about this on a theoretical level and this is what I've come up with:

if ( !boolvar ) { ...

would need to first negate the variable and then branch depending on that -> 2 instructions (negate + branch)

if ( boolvar == false ) { ...

would need to load the value of false into a register and then branch depending on that -> 2 instructions (load + branch)

if ( boolvar != true ) { ...

would need to load the value of true into a register and then branch ("branch-if-not-equal") depending on that -> 2 instructions (load + "branch-if-not-equal")

Am I wrong with my assumptions? Is there something I'm overlooking?

I know I can produce intermediate asm versions of programs, but I wouldn't know how to use this in a way so I can on one hand turn on compiler optimization and at the same time not have an empty if statement optimized away (or have the if statement optimized together with its content, giving some non-generic answer)

P.S.: Of course I also searched google and SO for this, but with such short search terms I couldn't really find anything useful

P.P.S.: I'd be fine with a semantically equivalent version which is not syntactical equivalent, e.g. not using if.


Edit: feel free to correct me if my assumptions about the emitted asm instructions are wrong.


Edit2: I've actually learned asm about 15yrs ago, and relearned it about 5yrs ago for the alpha architecture, but I hope my question is still clear enough to figure out what I'm asking. Also, you're free to assume any kind of processor extension common in consumer cpus up to AVX2 (current haswell cpu as of the time of writing this) if it helps in finding a good answer.

griffin
  • 1,261
  • 8
  • 24
  • Is this just for curiosity, or for actual performance? Modern processors are capable of fusing certain combinations of instructions into a single micro-op. (not sure if it's possible here though) – Mysticial Aug 29 '13 at 16:17
  • This isn't possible without 2 instructions, `cmp` `jne`. Even if you could turn this into a switch statement somehow it would still have to index then jump to the location of code to execute. It should be noted, there is `__builtin_expect` which can help. – Scotty Bauer Aug 29 '13 at 16:18
  • @Mysticial it's purely out of curiosity. – griffin Aug 29 '13 at 16:19
  • In the x86 architecture, no. But many others provide testing and jumping in one instruction. – wallyk Aug 29 '13 at 16:19
  • 2
    Actually, yes, you can use 1 instruction: `jmp [jumptable + boolvar * 4]`. But that's going to be slower than doing it the regular way (test and branch) – harold Aug 29 '13 at 16:19
  • "I'd be fine with a semantically equivalent version which is not syntactical equivalent, e.g. not using if." - Hum. Correct me if I'm wrong, but generally, assembly doesn't have `if`. –  Aug 29 '13 at 16:19
  • @H2CO3 You're right afaik, but I was refering to not using `if` in the c code, not what the asm would look like. – griffin Aug 29 '13 at 16:20
  • possible duplicate of [Compare and jump instruction as one instruction](http://stackoverflow.com/questions/13335585/compare-and-jump-instruction-as-one-instruction) –  Aug 29 '13 at 16:22
  • Judging by some of the incorrect things you say in the question (e.g. "load the value of true into a register and then branch (bne) depending on that "), it seems like you don't actually know any x86 assembly. What's the point of this question then? Shouldn't you learn x86 assembly before asking how to do something in it? – interjay Aug 29 '13 at 16:22
  • @harold could you post that as a possible answer, including how I would need to write my C code to produce this (not using inline asm of course, that would be cheating ;) ). Thx! – griffin Aug 29 '13 at 16:22
  • AFAIK you can't reliably emit that using C. For example, a `switch` commonly compiles to the equivalent of a "tree of nested `if`s" on many compilers. – harold Aug 29 '13 at 16:24
  • @interjay While I'm not sure my question is in itself correct, I have actually learned and used (written a compiler) asm, but that was about 5 years ago, so feel free to correct me where I'm wrong. Also, I've tried to not use actual asm because that would look different on different architectures (alpha for example), even though I specifically tagged the question as x86. And at last, the question was meant to be very theoretical and out of curiosity. – griffin Aug 29 '13 at 16:24
  • @griffin Obviously the answer is going to depend on the architecture... You need to pick one. There isn't going to be a platform-independent answer. I guess the architecture you did assembly on was not x86 because the things you said are completely wrong on x86 (e.g. there are no load or bne instructions, and you need to perform a `cmp` instruction, not a load). – interjay Aug 29 '13 at 16:28
  • 1
    @harold You could use a calculated goto in gcc, as long as it doesn't get optimized out. I think this would work, but I'm not at a computer now to test it: `goto {&elseLabel, &thenLabel}[!boolvar]` – ughoavgfhw Aug 29 '13 at 16:31
  • @interjay I initially learned asm on x86 (about 15yrs ago) and then re-learned it for alpha (about 5 yrs ago), and it seems you're right. I will edit my question and swap bne for jne, which seems to be the x86 version of "branch-if-not-equal". – griffin Aug 29 '13 at 16:31
  • @ughoavgfhw looks promising at least – harold Aug 29 '13 at 16:33
  • Then why ask a specific question about a subject that you don't remember the very basics of? This is like asking a specific trigonometry question when you don't know what multiplication is. – interjay Aug 29 '13 at 16:34
  • @interjay Sorry if I stepped on your toe by asking this. My viewpoint on this is till different, as, while I always forget the exact names of the asm instructions, I'm still pretty confident in knowing how the general concept of asm works (registers, branches, memory access, ...). You might of course say that I'm wrong there, but that's why I'm asking this question, including "Are my assumptions wrong?". So please, PLEASE (no sarcasm there), help me broaden my wisdom with a good answer, and feel free pointing out my errors there. – griffin Aug 29 '13 at 16:40

3 Answers3

3

At the end of my post it will say why you should not aim for this behaviour (on x86).

As Jerry Coffin has written, most jumps in x86 depend on the flags register.

There is one exception though: The j*cxz set of instructions which jump if the ecx/rcx register is zero. To achieve this you need to make sure that your boolvar uses the ecx register. You can achieve that by specifically assigning it to that register

register int boolvar asm ("ecx");

But by far not all compilers use the j*cxz set of instructions. There is a flag for icc to make it do that, but it is generally not advisable. The Intel manual states that two instructions

test ecx, ecx
jz ...

are faster on the processor.

The reason for being this is that x86 is a CISC (complex) instruction set. In the actual hardware though the processor will split up complex instructions that appear as one instruction in the asm into multiple microinstructions which are then executed in a RISC style. This is the reason why not all instructions require the same execution time and sometimes multiple small ones are faster then one big one.

test and jz are single microinstructions, but jecxz will be decomposed into those two anyways.

The only reason why the j*cxz set of instructions exist is if you want to make a conditional jump without modifying the flags register.

Sergey L.
  • 21,822
  • 5
  • 49
  • 75
  • Just to make sure I understand it correctly: This works under the assumption that `false` is defined as 0 (that's always the case unless redefined, right?) ? – griffin Aug 29 '13 at 16:47
  • @griffin In C (and your post is marked as C) `false == 0` any non-zero is `true`. It is not possible to redefine that. – Sergey L. Aug 29 '13 at 16:48
  • I was under the assumption that true and false are defined in stdbool.h (according to the C standard) and thus could be `#undef false` and e.g. `#define false 1`, but of course that would make no sense (and lead to a lot of problems probably). But thx, I think I get your answer now. I will wait a bit if there is a "better" answer (not saying yours is bad, but who knows ...) and then accept yours if there is not. Thx! – griffin Aug 29 '13 at 16:51
  • 1
    @griffin `stdbool.h` is a recent addition to the C standard, but it only defines `true` and `false` as macros that evaluate to `1` and `0` respectively, its a convenience thing. No matter what they are defined as `1 == 2` will always evaluate to `0` on the hardware and if you redefine `false` as `1` and put it into an `if` you will get the wrong branch. – Sergey L. Aug 29 '13 at 16:53
  • That's why I said it would make no sense to redefine it - but thx for the info on it being a recent addition, didn't know that. Upvoted your comment for helping me learn something new, thx! – griffin Aug 29 '13 at 17:01
  • Compilers don't use `jrcxz` because it's slower than a macro-fused `test ecx,ecx` / `jz` on most CPUs. That's 2 x86 instruction, but fuses in the decoders to a *single* uop (AMD since Bulldozer, Intel since Core2). And yes, `j*cxz` is slow for no apparent reason except on Bulldozer-family / Ryzen, where `j*cxz` and `loop` are single uop. [Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?](//stackoverflow.com/q/35742570). `jecxz` does decode to 2 uops on Sandybridge-family, but it doesn't set FLAGS so it's not exactly test+jnz. – Peter Cordes Feb 17 '19 at 00:54
1

Yes, it's possible -- but doing so will depend on the context in which this code takes place.

Conditional branches in an x86 depend upon the values in the flags register. For this to compile down to a single instruction, some other code will already need to set the correct flag, so all that's left is a single instruction like jnz wherever.

For example:

boolvar = x == y;
if (!boolvar) {
    do_something();
}

...could end up rendered as something like:

    mov eax, x
    cmp eax, y    ; `boolvar = x == y;`
    jz @f
    call do_something
@@:

Depending on your viewpoint, it could even compile down to only part of an instruction. For example, quite a few instructions can be "predicated", so they're executed only if some previously defined condition is true. In this case, you might have one instruction for setting "boolvar" to the correct value, followed by one to conditionally call a function, so there's no one (complete) instruction that corresponds to the if statement itself.

Although you're unlikely to see it in decently written C, a single assembly language instruction could include even more than that. For an obvious example, consider something like:

    x = 10;
looptop:
    -- x;
    boolvar = x == 0;
    if (!boolvar)
        goto looptop;

This entire sequence could be compiled down to something like:

    mov ecx, 10
looptop:
    loop looptop
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • Sadly, not that many instructions can be predicated. There's `cmov`, `fcmov`, jumps of course, and I can't think of any others, perhaps there are a couple.. – harold Aug 29 '13 at 16:32
  • @harold: `setcc` is the other x86 instruction that can use any of the condition-codes in FLAGS as a predicate. [`[v]cmpps` / ss / pd / sd](https://www.felixcloutier.com/x86/cmppd) takes a comparison predicate to produce an integer compare result. AVX512 integer comparisons also take predicates. (AVX2 and earlier only had `pcmpgt` and `pcmpeq` predicates available for packed-compare). So when auto-vectorizing, depending on the if body, sure `!boolvar` could be 1 or even 0 instructions. Or maybe Jerry was thinking of non-x86 ISAs like ARM, which \@phuclv's answer mentions. – Peter Cordes Feb 17 '19 at 01:04
  • `if (!boolvar) goto looptop;` C has `do{}while()` loop syntax to express the loop structure that's idiomatic for asm. [Why are loops always compiled into "do...while" style (tail jump)?](//stackoverflow.com/q/47783926). Oh, but you're intentionally writing it this way to replicate the `if()` from the question instead of changing it to a `}while()`. Anyway, worth mentioning that `loop` is not fast (on most CPUs). – Peter Cordes Feb 17 '19 at 01:08
1

Am I wrong with my assumptions

You are wrong with several assumptions. First you should know that 1 instruction is not necessarily faster than multiple ones. For example in newer μarchs test can macro-fuse with jcc, so 2 instructions will run as one. Or a division is so slow that in the same time tens or hundreds of simpler instructions may already finished. Compiling the if block to a single instruction doesn't worth it if it's slower than multiple instructions

Besides, if ( !boolvar ) { ... doesn't need to first negate the variable and then branch depending on that. Most jumps in x86 are based on flags, and they have both the yes and no conditions, so no need to negate the value. We can simply jump on non-zero instead of jump on zero

Similarly if ( boolvar == false ) { ... doesn't need to load the value of false into a register and then branch depending on that. false is a constant equal to 0, which can be embedded as an immediate in the instruction (like cmp reg, 0). But for checking against zero then just a simple test reg, reg is enough. Then jnz or jz will be used to jump on zero/non-zero, which will be fused with the previous test instruction into one

It's possible to make an if header or body that compiles to a single instruction, but it depends entirely on what you need to do, and what condition is used. Because the flag for boolvar may already be available from the previous statement, so the if block in the next line can use it to jump directly like what you see in Jerry Coffin's answer

Moreover x86 has conditional moves, so if inside the if is a simple assignment then it may be done in 1 instruction. Below is an example and its output

int f(bool condition, int x, int y)
{
    int ret = x;
    if (!condition)
        ret = y;
    return ret;
}

f(bool, int, int):
        test    dil, dil ; if(!condition)
        mov     eax, edx ; ret = y
        cmovne  eax, esi ; if(condition) ret = x
        ret

Some other cases you don't even need a conditional move or jump. For example

bool f(bool condition)
{
    bool ret = false;
    if (!condition)
        ret = true;
    return ret;
}

compiles to a single xor without any jump at all

f(bool):
        mov     eax, edi
        xor     eax, 1
        ret

ARM architecture (v7 and below) can run any instruction as conditional so that may translate to only one instruction

For example the following loop

while (i != j)
{
   if (i > j)
   {
       i -= j;
   }
   else
   {
       j -= i;
   }
}

can be translated to ARM assembly as

loop:   CMP  Ri, Rj         ; set condition "NE" if (i != j),
                            ;               "GT" if (i > j),
                            ;            or "LT" if (i < j)
        SUBGT  Ri, Ri, Rj   ; if "GT" (Greater Than), i = i-j;
        SUBLT  Rj, Rj, Ri   ; if "LT" (Less Than), j = j-i;
        BNE  loop           ; if "NE" (Not Equal), then loop
phuclv
  • 37,963
  • 15
  • 156
  • 475