0

I wrote this minimal code in C++ that takes 3 numbers: a,b and c and a bitmask r. Then, it has a result L which should be equal to c if second bit in r is set, otherwise equal to b if first bit in r is set and finally a if first 2 bits in r are both not set. I want to use assembly to optimize it and GCC (g++) to compile it and this is my code:

#include <cstdio>
#include <cstdlib>
int main(){
    uint a=1;
    uint b=2;
    uint c=3;
    uint r=1;
    uint L;
    asm(
        "mov %2,%0;"
        "bt $0,%1;"
        "cmovc %3,%0;"
        "bt $1,%1;"
        "cmovc %4,%0;"
        : "=r" (L)
        : "r" (r), "r" (a), "r" (b), "r" (c)
    );
    printf("%d\n",L);
    return 0;
}

In the setup above, L should be equal to b, however, no matter with what parameters I try to compile it with, the printed value is always 3, i.e. c. Why is that and how do I write this program correctly?

EDIT: This question is already answered here, but I still want to post an answer to this question because it can only help others. I will write it here since I am forbidden to post it as an actual answer, properly:

It turns out that the code is just fine unles I use -O3 flag, where when I use -O3, the compiler decides to mess up like this:
In this minimal example, it decides to store a and r in the same register, then it stores L to a or b, I am unsure. Anyway, it overwrites registers which it shouldn't.
In my actual code where I wanted to apply this assembly, the L variable is actually a reference given as an argument to a function. Now the compiler decided to store some of a,b or c to L as a way to optimize the code, ignoring completely that L already has a value.
This happens because my assembly snippet doesn't know that it should keep the value of L in its place because I told him that the value is "=r" (write-only) instead of "+r" (read-write).
Also, r should be moved to output operands, again with "+r" because even though bt won't change it, it still understands it as an output operand.

  • 1
    Why bother? Write it in straightforward C and let the compiler optimize it instead of tying to one particular architecture. – Shawn Jul 10 '22 at 12:27
  • 2
    The compiler is perfectly capable of generating assembly for this from C although it doesn't use `bt`. Any particular reason why you need that? `bt` is mostly useful for variable bit index. – Jester Jul 10 '22 at 12:28
  • @Shawn Because I want to do something much more complicated and want to learn some assembly for that, but this is the simplest thing that I fail to learn... – Halid Beslic Jul 10 '22 at 12:29
  • (clang 14, for example, produces very similar output on x86-64 (Just using `testb` instead of `bt` as the only major difference)) – Shawn Jul 10 '22 at 12:30
  • 1
    There are two problems: (a) your code writes to `%1` when you have declared it as an input operand. This is not permitted. And (b) your code writes to `%0` before evaluating all input operands. This causes the problem explained in the duplicate. If you must use inline assembly, split it up into as small blocks as possible. Ideally, each block holds only one instruction. In your case, something like `L = a; asm ("bt $0, %1; cmovc %2, %0" : "+r"(L) : "r"(r), "r"(b) : "cc"); asm ("bt $1, %1; cmovc %2, %0": "+r"(L) : "r"(r), "r"(c) : "cc");` might do the trick (untested). – fuz Jul 10 '22 at 15:19
  • @fuz Oh right, you are right. I did indeed fix that first problem and then forget to mention it in my post. On the other hand, thanks for telling me your tips, I will remember it. – Halid Beslic Jul 10 '22 at 18:55
  • You might also check out [this](https://stackoverflow.com/a/41668986/2189500) answer which shows how to output condition codes. – David Wohlferd Jul 11 '22 at 23:51

0 Answers0