When to use earlyclobber constraint in extended GCC inline assembly?

Question

I understand when to use a cobbler list (e.g. listing a register which is modified in the assembly so that it doesn't get chosen for use as an input register, etc), but I can't wrap my head around the the earlyclobber constraint &. If you list your outputs, wouldn't that already mean that inputs can't use the selected register (aside from matching digit constraints)?

For example:

asm(
    "movl $1, %0;"
    "addl $3, %0;"
    "addl $4, %1;"   // separate bug: modifies input-only operand
    "addl %1, %0;"
    : "=g"(num_out)
    : "g"(num_in)
    :
);

Would & even be needed for the output variables? The compiler should know the register that was selected for the output, and thus know not to use it for the input.

score 31 · Accepted Answer · edited Feb 28 '22 at 13:59

31

By default, the compiler assumes all inputs will be consumed before any output registers are written to, so that it's allowed to use the same registers for both. This leads to better code when possible, but if the assumption is wrong, things will fail catastrophically. The "early clobber" marker is a way to tell the compiler that this output will be written before all the input has been consumed, so it cannot share a register with any input.

GNU C inline asm syntax was designed to wrap a single instruction as efficiently as possible. You can put multiple instructions in an asm template, but the defaults (assuming that all inputs are read before any outputs are written) are designed around wrapping a single instruction.

It's the same constraint syntax as GCC uses in its machine-description files that teach the compiler what instructions are available in an ISA.

edited Feb 28 '22 at 13:59

Peter Cordes

328,167
45
605
847

answered Apr 04 '13 at 19:23

R.. GitHub STOP HELPING ICE

208,859
35
376
711

So that means that the compiler could assume, for example, that the `eax` register I use for an input is no longer needed by the time the output is needed, so it reuses the `eax` register? Does this mean the output in my code in the original question does in fact require a `&` modifier? – Vilhelm Gray Apr 04 '13 at 19:29
3

Your code is wrong for multiple reasons. For instance, you're modifying an input register. – R.. GitHub STOP HELPING ICE Apr 04 '13 at 20:31
1

A typical "earlyclobber" for x86 is the use of extending multiplication, where `EDX:EAX` / `RDX:RAX` _implicitly_ are outputs. When those instructions are used in a multi-input `asm()` statement, the `d` and `a` registers can not be used as input after the earlyclobber output "consumed" the contents. If that specific input operand is used more than once, the compiler will put it into a _different_ register. – FrankH. Apr 19 '13 at 08:43
Related about *why* we need to tell the compiler about stuff, and the design philosophy: [Why we need Clobbered registers list in Inline Assembly?](https://stackoverflow.com/q/69453851) (I already added a couple paragraphs in this answer about why it's designed this way.) – Peter Cordes Feb 28 '22 at 14:00
Interesting note about the original design intent being a single instruction. Ive been reading some GCC documentation on inline ASM and it kept using the singular word 'instruction' to refer to the contents of the ASM. It did this often enough that I figured it couldn't have been a typo. – sherrellbc Jan 28 '23 at 13:52

Ciro Santilli OurBigBook.com · Answer 2 · 2020-11-02T07:36:47.110

Minimal educational example

Here I provide a minimal educational example that attempts to make what https://stackoverflow.com/a/15819941/895245 mentioned clearer.

This specific code is of course not useful in practice, and could be achieved more efficiently a single lea 1(%q[in]), %out instruction, it is just a simple educational example.

main.c

#include <assert.h>
#include <inttypes.h>

int main(void) {
    uint64_t in = 1;
    uint64_t out;
    __asm__ (
        "mov %[in], %[out];" /* out = in */
        "inc %[out];"        /* out++ */
        "mov %[in], %[out];" /* out = in */
        "inc %[out];"        /* out++ */
        : [out] "=&r" (out)
        : [in] "r" (in)
        :
    );
    assert(out == 2);
}

Compile and run:

gcc -ggdb3 -std=c99 -O3 -Wall -Wextra -pedantic -o main.out main.c
./main.out

This program is correct and the assert passes, because & forces the compiler to choose different registers for in and out.

This is because & tells the compiler that in might be used after out was written to, which is actually the case here.

Therefore, the only way to not wrongly modify in is to put in and out in different registers.

The disassembly:

gdb -nh -batch -ex 'disassemble/rs main' main.out

contains:

   0x0000000000001055 <+5>:     48 89 d0        mov    %rdx,%rax
   0x0000000000001058 <+8>:     48 ff c0        inc    %rax
   0x000000000000105b <+11>:    48 89 d0        mov    %rdx,%rax
   0x000000000000105e <+14>:    48 ff c0        inc    %rax

which shows that GCC chose rax for out and rdx for in.

If we remove the & however, the behavior is unspecified.

In my test system, the assert actually fails, because the compiler tries to minimize register usage, and compiles to:

   0x0000000000001055 <+5>:     48 89 c0        mov    %rax,%rax
   0x0000000000001058 <+8>:     48 ff c0        inc    %rax
   0x000000000000105b <+11>:    48 89 c0        mov    %rax,%rax
   0x000000000000105e <+14>:    48 ff c0        inc    %rax

therefore using rax for both in and out.

The result of this is that out is incremented twice, and equals 3 instead of 2 in the end.

Tested in Ubuntu 18.10 amd64, GCC 8.2.0.

More practical examples

multiplication implicit output registers
non-hardcoded scratch registers: GCC: Prohibit use of some registers

Worth mentioning that this example is only an intentionally simplistic easy-to-understand example. An `asm` statement where the first or last instruction is `mov` usually means you're doing it wrong. Use constraints to get the compiler to put inputs where you want them, or take outputs wherever you leave them (even in EFLAGS with GCC6 condition-code outputs.) Also, x86 can copy-and-increment with `lea 1(%q[in]), %out`. (`q` modifier to force 64-bit address size, regardless of output size. https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86-Operand-Modifiers) — Peter Cordes, Feb 27 '19 at 04:23
@PeterCordes definitely, that code is useless. A useful example would also be a good addition. — Ciro Santilli OurBigBook.com, Feb 27 '19 at 09:45
Nothing jumped to mind immediately, just thought it would be worth a comment in the code for the benefit of people trying to learn inline asm in general that it should basically never start with `mov`, because constraints. (Although copying a input that you read multiple times is one actually reasonable use-case.) — Peter Cordes, Feb 27 '19 at 09:47
@CiroSantilli新疆改造中心六四事件法轮功 Another use of `=&r`, which nobody seems to have mentioned explicitly, is having temporary register variables that one can refer to by meaningful names, e.g. `%[running_sum]` instead of hard-coded `%r8`. — Alexey Frunze, Feb 27 '19 at 11:45
@AlexeyFrunze cool! I ended up adding an example of that at: https://stackoverflow.com/questions/6682733/gcc-prohibit-use-of-some-registers/54963829#54963829 — Ciro Santilli OurBigBook.com, Mar 02 '19 at 22:56
@AlexeyFrunze: Yup, and even better that lets the compiler pick a convenient register for you, instead of hard-coding register allocation choices in the asm template. — Peter Cordes, Oct 26 '20 at 07:51

When to use earlyclobber constraint in extended GCC inline assembly?

2 Answers2

Linked

Related