3

I have a program where I need to pass the pointer to a local variable to a asm statement. (For those who are curious, the intended use of this code is to call the INVVPID instruction.)

#include <stdio.h>

static void f(long x){
    long a = x;
    asm volatile(
        "xorq %%rcx, %%rcx;"
        "cmpq (%%rax), %%rcx;"
        "xorq %%rax, %%rax"
        :
        : "a"(&a)
        : "cc", "memory");
}

int main(void)
{
    f(0);
    f(1);
}

I am using a 64-bit machine and gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-1). When compiled with -O0, the program runs well without error. However, when compiled with -O3, this program runs into segmentation fault at 0x401043 because RAX=0. The disassembly is:

0000000000401020 <main>:
  401020:   48 c7 44 24 f8 00 00    movq   $0x0,-0x8(%rsp)
  401027:   00 00 
  401029:   48 8d 44 24 f8          lea    -0x8(%rsp),%rax
  40102e:   48 31 c9                xor    %rcx,%rcx
  401031:   48 3b 08                cmp    (%rax),%rcx
  401034:   48 31 c0                xor    %rax,%rax
  401037:   48 c7 44 24 f8 01 00    movq   $0x1,-0x8(%rsp)
  40103e:   00 00 
  401040:   48 31 c9                xor    %rcx,%rcx
  401043:   48 3b 08                cmp    (%rax),%rcx
  401046:   48 31 c0                xor    %rax,%rax
  401049:   31 c0                   xor    %eax,%eax
  40104b:   c3                      ret    

It looks like the problem happens because local variable a is optimized out in f(1). The program can be fixed by changing the line to volatile long a = x;.

My question is: is this a legitimate behavior of the GCC compiler? My intuition tells me that things should not be optimized out because the address of a appears in f().

user207421
  • 305,947
  • 44
  • 307
  • 483
Eric Stdlib
  • 1,292
  • 1
  • 18
  • 32
  • Well. that's what `volatile` is for: to tell the compiler that this variable can be changed in ways the compiler can't see. – user207421 Aug 29 '22 at 01:49
  • @user207421 that's not the point of the question. `&a` appears in the input list of the ASM so the compiler IS aware and it shouldn't optimize it out. – bolov Aug 29 '22 at 01:52
  • Try to add `a` to the input list and see what happens. – bolov Aug 29 '22 at 01:53
  • 3
    Oh, wrong duplicate; you have UB from writing `%rax` without telling the compiler about it! `"a"(stuff)` tells the compiler about a read-only input, but your template writes RAX so the pointer isn't still there for the next `asm` statement. `volatile long a = x` doesn't fix the UB, it just happens to work in that compiler version. – Peter Cordes Aug 29 '22 at 01:57
  • 1
    It's not optimizing out the local var; as you can see, it does both stores to it (`movq $0x0,-0x8(%rsp)` and later `$1`), and sets up the pointer to it in RAX exactly like you asked for. Looking at asm output e.g. on https://godbolt.org/ (not disassembly) from the compiler would make it easier to remind yourself which instructions are compiler-generates vs. which are from the asm template. – Peter Cordes Aug 29 '22 at 02:13
  • @Peter Cordes is correct. I did not notice that I changed RAX but only setting it as an input operand. This creates the undefined behavior. After I remove `xorq %%rax, %%rax`, looks like the problem no longer happens. – Eric Stdlib Aug 29 '22 at 02:25
  • 1
    That's still just luck unless you also removed the zeroing of `rcx` or declared a clobber on it. You could ask the compiler to give you a zero in a register of its choice, or `cmpq $0, (%%rax)` if you can use the opposite compare direction. (So CF won't get set ever, meaning you can't `adc` into something to count non-zero elements..) – Peter Cordes Aug 29 '22 at 02:33

0 Answers0