How does asm volatile execute?

Question

gcc_inline void
lcr0(uint32_t val)
{
    __asm __volatile("movl %0,%%cr0" : : "r" (val));
}

In the above code, I'm not sure where val is inserted into the assembly string. Does val replace the %c in the string?

If possible can someone clarify what : : "r" does as well?

Please [read the manual](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html). — fuz, Mar 20 '20 at 00:40
"Does val replace the %c in the string" - Actually, it generates an assembler instruction to move the value from (val) into Control Register 0 (cr0). — David Wohlferd, Mar 20 '20 at 00:49
You can always look at the compiler's asm output for a stand-alone version of this function, e.g. on https://godbolt.org/. See also https://stackoverflow.com/tags/inline-assembly/info for guides and docs. — Peter Cordes, Mar 20 '20 at 00:55
I'd recommend a `"memory"` clobber on this; you want GCC to make sure it respects source order when optimizing loads and stores around this asm statement. CR0 bits include "write-protect" to make the kernel respect pages being read-only or not, and "paging-enable". — Peter Cordes, Mar 20 '20 at 06:12

Chris Hall · Answer 1 · 2020-03-20T11:45:35.557

1

"r" means you are specifying %0 to be a register (as an input). (val) means you are specifying that the register should contain the value of val. So the compiler will allocate a register and make sure it contains val. For x86_64 the first argument for the function will be in %edi/%rdi, and that is what %0 will expand to.

I stand corrected...

...if the function is not inlined, val will be passed in edi/rdi but might be shuffled around before the asm, but the "r" will cause the compiler to arrange for it to be in some register for the asm. (See the effect of -O0, below).

Also a function which is not declared/defined to be inline may be inlined, at higher levels of optimization.

I note that it is only possible to read/write CR0 to/from a general purpose register and then only at privilege level 0. @PeterCordes notes that a "memory" clobber would probably a Good Idea. Clearly, changing CR0 can have really exciting side effects !

When I tried this at -O0 I found that a simple inline was ignored, and the function compiled for x86_64 to:

   lcr0:
        pushq   %rbp
        movq    %rsp, %rbp
        movl    %edi, -4(%rbp)
        movl    -4(%rbp), %eax
        movl %eax,%cr0
        nop
        popq    %rbp
        ret

I guess that gcc_inline may include __attribute__((__always_inline__)), in which case even at -O0 lcr0 is inlined -- but with lots of lovely stack business. This time for x86:

  main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $16, %esp
        movl    12(%ebp), %eax
        movl    (%eax), %eax
        movl    (%eax), %eax
        movl    %eax, -4(%ebp)
        movl    -4(%ebp), %eax
        movl    %eax, -8(%ebp)
        movl    -8(%ebp), %eax
        movl %eax,%cr0
        nop
        movl    $0, %eax
        leave
        ret

edited Mar 20 '20 at 11:45

answered Mar 20 '20 at 00:43

Chris Hall

1,707
1
4
14

1

The last sentence is kinda misleading. It's only true because it's the most optimal placement; it wouldn't necessarily be the case, and might not even at `-O0` if the compiler forces spills of everything to an actual memory address then reloads in a random register. – R.. GitHub STOP HELPING ICE Mar 20 '20 at 00:46
2

Also the function is inline, so if actually inlined `val` is likely to live in something other than the ABI first-argument register. – R.. GitHub STOP HELPING ICE Mar 20 '20 at 00:47
@R..GitHubSTOPHELPINGICE: thank you... I have attempted to make good. – Chris Hall Mar 20 '20 at 01:11
1

@Chris: Usually nobody cares about `-O0`, but the important point is that wrapper functions usually/hopefully inline regardless of `inline`. So the actual input value after optimization could be coming from anywhere; register, memory, or immediate. The important point is that you're asking the compiler to have it in a GP-integer register for you when the asm template runs, and that's all. Also worth noting it's *not* ordered wrt. loads/stores to memory because there's no `"memory"` clobber. Since CR0 bits include paging-enabled and write-protect, you should probably use a `"memory"` clobber – Peter Cordes Mar 20 '20 at 01:12
1

@PeterCordes, for completeness: I note also that when reading/writing `CR0` the destination/source can *only* be a register.; and it's a privilege level 0 operation. The need for a `"memory"` clobber is interesting. I have come to believe that any register spilling excludes things which the compiler has reason to believe are not visible outside the current function. And, of course, it cannot spill stuff held in the caller's registers (or its ancestors). Clearly anybody writing to `CR0` had better be *wide awake* ! – Chris Hall Mar 20 '20 at 10:58
```#define gcc_aligned(mult) __attribute__((aligned (mult)))``` I apologize for not providing the code for gcc_aligned but this is how it is defined. – Hamza Mcsheehan Mar 20 '20 at 16:42
@ChrisHall: that's correct; a `"memory"` clobber looks to the optimizer like an opaque non-inline function call that can thus read/write any possibly *globally accessible* memory. Locals can be kept in a register across it, or even just reordered. I think if you're modifying CR0 from C you hopefully at least have the same stack and have it read+write mapped both before and after, so I didn't mention it. There are also many other bits in CR0, some of which don't change the meaning of loads/stores. (Most kernels would enable PM + paging once in hand-written asm, but might flip write-protect) – Peter Cordes Mar 20 '20 at 19:57
@ChrisHall: showing the asm for an `always_inline` function inlining with optimization disabled is not really useful to anyone. It's kind of a separate corner case that just confuses the issues at hand here. See [Why is this C++ wrapper class not being inlined away?](https://stackoverflow.com/a/54074497) for an explanation of that behaviour. – Peter Cordes Mar 21 '20 at 00:33
re: "a simple `inline` was ignored". That's because `inline` is kind of similar to `static`; you're just telling the compiler than any other compilation units that call this function will also see a global definition of it. (The name makes sense because *if* it chooses to inline into every call site, it can skip emitting a stand-alone definition for the function.) IDK if gcc makes the `inline` keyword bias the heuristics to inline or not-inline any more than `static` would. So if you know exactly what `inline` means and doesn't mean, saying it's "ignored" is a bit sloppy but works. – Peter Cordes Mar 21 '20 at 00:36

How does __asm __volatile execute?

1 Answers1

How does asm volatile execute?