CMOV in inline assembly does not work as expected

Question

I am trying to figure out how to use the CMOV instruction correctly. As far as I know CMOV is used exactly like conditional jump instructions. It jumps/moves depending on the flags of the previous test or cmp instructions:

test %ecx, %ecx
cmovz %ebx, %eax

So I tried using this to create a small function that returns either the old_val or the new_val depending if the test variable is 0:

uint32_t cmov(uint8_t pred, uint32_t old_val, uint32_t new_val)
{
    uint32_t result = 0;
    __asm__ __volatile__(
        "mov %2, %0;"
        "test %1, %1;"
        "cmovz %3, %0;"
        : "=r"(result)
        : "r"(pred), "r"(old_val), "r"(new_val)
        : "cc");
    return result;
}

But when I compile this it always returns the old_val. Where is my mistake?

Clearly, this can be done in C/C++ with a simple if but I want to do it with CMOV.

Full example:

#include <cstdint>
#include <iostream>

uint32_t cmov(uint8_t pred, uint32_t old_val, uint32_t new_val)
{
    uint32_t result = 0;
    __asm__ __volatile__(
        "mov %2, %0;"
        "test %1, %1;"
        "cmovz %3, %0;"
        : "=r"(result)
        : "r"(pred), "r"(old_val), "r"(new_val)
        : "cc");
    return result;
}

int main()
{
    uint32_t old_val = 1;
    uint32_t new_val = 5;
    for (uint8_t pred = 0; pred < 2; pred++)
    {
        uint32_t result = cmov(pred, old_val, new_val);

        std::cout << "\npred: " << pred
                << "\nold_val: " << old_val << "\nnew_val: " << new_val
                << "\nresult: " << result << std::endl;
    }
    return 0;
}

If you want gcc to use `cmov`, use `return pred ? old_val : new_val`. That usually compiles to branchless code (although `if` can too, if the compiler decides it's a good idea, especially with profile-guided optimization. See [gcc optimization flag -O3 makes code slower than -O2](https://stackoverflow.com/q/28875325)) And BTW, if you care about efficiency for `std::cout <<`, use `'\n'` instead of `std::endl`. — Peter Cordes, Jun 28 '18 at 08:41
Anyway, I haven't spotted the bug in your inline asm yet. Try looking at the compiler's complete asm output. But if you're hoping to get more efficient code, this is extremely unlikely to be optimal; inline asm defeats optimizations like constant propagation (https://gcc.gnu.org/wiki/DontUseInlineAsm), and the way you're using it force an extra `mov` instead of using constraints like `result = old_val;` `asm("..." : "+r"(result) : "r"(pred), "r"(new_val));`. asm `volatile` can't be optimized away even if the compiler doesn't need the result, or wants to hoist it out of a loop. — Peter Cordes, Jun 28 '18 at 08:43
Oh, looking at compiler output (https://godbolt.org/g/77g9e8) make it obvious. You forgot to use an early-clobber, so gcc uses `%ebx` for `new_val` *and* for `result`. — Peter Cordes, Jun 28 '18 at 08:49
@PeterCordes I do not want to do it for performance reasons and I know that the compiler is aware of CMOV. See it just as an exercise :) Thank you for the aswer — moschn, Jun 28 '18 at 09:00

CMOV in inline assembly does not work as expected

0 Answers0