2

The result I want to achieve with the above cmov function is, if pred=true, return t_val, otherwise return f_val. But in actual operation, t_val is returned every time.

#include<stdio.h>
#include <stdint.h>
#include <stdlib.h>
int cmov(uint8_t pred, uint32_t t_val, uint32_t f_val) {
uint32_t result;
 __asm__ volatile (
 "mov %2, %0;"
 "test %1, %1;"
 "cmovz %3, %0;"
 "test %2, %2;"
 : "=r" (result)
 : "r" (pred), "r" (t_val), "r" (f_val)
 : "cc"
 );
 return result;
 }
 
int main()  {  
  
     int a=1,b=4,c=5,d;
    int res = (a==3); //
    printf("res = %d\n",res);
    d = cmov(res,b,c);
    printf("d = %d\n",d);
    a=3;
    res = (a==3);
    d = cmov(res,b,c);
    printf("d = %d\n",d);
return 0;
};  
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Gerrie
  • 736
  • 3
  • 18
  • Because you pass a non-zero `b` both times, so the `z` condition is false. Single-step the asm with a debugger if that isn't clear. This seems really inefficient; Use a `"+r"` constraint for the output so you don't need a `mov` in the asm template; let the compiler get `result=t_val` done for you. The `test %2,%2` *after* the cmoz is also doing nothing; you're not even using a GCC6 flag-output operand. – Peter Cordes Oct 26 '20 at 07:25
  • test %2 %2 is used to refresh the flag bit to prevent some information from being leaked out. The res passed in the first time is 0, and the res passed in the second time is 1. This can be seen in the printf information. Does b non-zero affect it? Isn't it making choices with res? – Gerrie Oct 26 '20 at 07:30
  • Oh yes, your confusing cluttered example that passes `res` as the first arg is actually the predicate, not a result. Fixed my answer with the likely explanation: missing early-clobber declaration. "leaking information" makes no sense, though; if an attacker can see FLAGS values, they can see register values, and the predicate is still in registers. – Peter Cordes Oct 26 '20 at 07:36

1 Answers1

3

You're missing an early-clobber on the output ("=&r"), so probably GCC picks the same register for the output as one of the inputs, probably pred. So test %1,%1 is probably testing t_val (b). Single-step the asm with a debugger, and/or look at GCC's asm output. (On https://godbolt.org/ or with gcc -S).

This seems really inefficient; Use a "+r"(result) constraint for the output (with uint32_t result=t_val;) so you don't need a mov in the asm template; let the compiler get result=t_val done for you, possibly by simply choosing the same register.

The test %2,%2 after the cmoz is also doing nothing; you're not even using a GCC6 flag-output operand. It's a totally wasted instruction.

Also, this doesn't need to be volatile. The output is a pure function of the inputs, and doesn't need to run at all if the output is unused.

It's probably a bad idea to use inline asm at all for just a cmov; compile with -O3 and write your source such that GCC thinks its a good idea to do if-conversion into branchless code. inline asm destroys constant propagation and defeats other optimizations. And this way forces you to use a test instruction in the template, not reading FLAGS set from some earlier add or whatever, and not letting the compiler reuse the same FLAGS result for multiple cmov or other instructions. https://gcc.gnu.org/wiki/DontUseInlineAsm

Or if you can't hand-hold the compiler into making asm you want, write more of your real use-case in asm, not just a wrapper around cmov.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks for your answer. But If t_val is an address, how can I transfer the data corresponding to the t_val address to the result? And does cmovz transfer one byte at a time? – Gerrie Oct 26 '20 at 08:45
  • @cyj: Dereference the result of cmov if you want. Or if `t_val` is guaranteed to be a valid address even if the condition is false, you could use it as a memory operand. But [`cmov` can't do a conditional load: the load always happens with a memory source](https://stackoverflow.com/questions/54050188), feeding an ALU select. And no, of course `cmov` doesn't transfer one byte at a time. It's just an ALU select between two inputs of whatever the operand-size is: 16, 32, or 64-bit. https://uops.info/ shows that it's a single uop on modern CPUs. – Peter Cordes Oct 26 '20 at 08:56
  • In your answer just now, you mentioned the code that converts if to branchless. This is actually what I want to do. Can you give a more detailed introduction to this point mentioned in your answer? – Gerrie Oct 26 '20 at 10:17
  • @cyj: [Conditional move (cmov) in GCC compiler](https://stackoverflow.com/q/60019582) / [Convert GCC Inline Assembly CMOV to Visual Studio Assembler](https://stackoverflow.com/q/37778784) / [Make gcc use conditional moves](https://stackoverflow.com/q/30486708) / [gcc optimization flag -O3 makes code slower than -O2](//stackoverflow.com/q/28875325). Does it need to be branchless for some security reason, like no data-dependent branching for timing side-channels? If not, hint the compile in the right direction and let it make tuning decisions. (With profile-guided optimization if possible) – Peter Cordes Oct 26 '20 at 11:18
  • I want to try to close some time side channels or Cache side channels with cmov. Do you think it can be done? Or from the perspective of efficiency? – Gerrie Oct 26 '20 at 11:29
  • @cyj: Ok then yes, inline asm is the only way to guarantee you get a `cmov`. But you still only need `test`/`cmov` in the asm statement. Removing the `mov` instruction would have made this bug impossible as well as helping efficiency. Cache-timing side-channels are harder to close; any data-dependent array indexing creates that possibility. As far as efficiency, using inline asm like will typically hurt; compilers can normally choose cmov on their own when it's the best choice for performance, especially with profile-guided optimization. – Peter Cordes Oct 26 '20 at 11:32
  • The original intention of the last test %2 %2 is what I saw in a paper. It is to clear the influence of the flag through this instruction to avoid leaking information? Why remove the mov instruction? The effect of that mov instruction is to pass t_val to result when pred is true. If mov is removed, when pred is true, the returned value is not t_val – Gerrie Oct 26 '20 at 11:44
  • @cyj: Obviously you have to change the constraints to remove the `mov`, like I said in my answer. Ask the compiler to already have `t_val` in `%0` either with a matching constraint, or more simply with a `"+r"` input like I described in more detail in my answer. As for why, to make it less inefficient and avoid the need for an early clobber. re: leaking information: Leaking to where, across what privilege boundary? FLAGS is probably going to get overwritten within the next few instructions anyway, and the `%1` operand hasn't been overwritten so it's still there in another register. – Peter Cordes Oct 26 '20 at 11:54