Can I tell the compiler that I need to earlyclobber a memory operand?

Question

Consider this program, which can be compiled as either 32-bit or 64-bit:

#include <stdio.h>

static int f(int x, int y) {
    __asm__(
        "shrl $4, %0\n\t"
        "movl %1, %%edx\n\t"
        "addl %%edx, %0"
        : "+r"(x)      // needs "+&r" to work as intended
        : "r"(y)
        : "edx"
    );
    return x;
}

int main(void) {
    printf("0x%08X\n", f(0x10000000, 0x10000000));
}

At -O1 or higher, it gives the wrong answer (0x02000000 instead of 0x11000000), because x gets written before y gets read, but the constraint for x doesn't have the & to specify earlyclobber, so the compiler put them in the same register. If I change +r to +&r, then it gives the right answer again, as expected.

Now consider this program:

#include <stdio.h>

static int f(int x, int y) {
    __asm__(
        "shrl $4, %0\n\t"
        "movl %1, %%edx\n\t"
        "addl %%edx, %0"
        : "+m"(x)        // Is this safe without "+&m"?  Compilers reject that
        : "m"(y)
        : "edx"
    );
    return x;
}

int main(void) {
    printf("0x%08X\n", f(0x10000000, 0x10000000));
}

Other than using m constraints instead of r constraints, it's exactly the same. Now it happens to give the right answer even without the &. However, I understand relying on this to be a bad idea, since I'm still writing to x before I read from y without telling the compiler I'm doing so. But when I change +m to +&m, my program no longer compiles: GCC tells me error: input operand constraint contains '&', and Clang tells me invalid output constraint '+&m' in asm. Why doesn't this work?

I can think of two possibilities:

It's always safe to earlyclobber things in memory, so the & is rejected as redundant
It's never safe to earlyclobber things in memory, so the & is rejected as unsatisfiable

Is one of those the case? If the latter, what's the best workaround? Or is something else going on here?

I'm not sure I'm prepared to offer any guidance one way or another. Have you read (carefully) the [docs](https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html) for `&`? The phrasing is a bit odd, but there could be a clue if you stare at it long enough. You aren't using "alternatives" so you can ignore that part. — David Wohlferd, Jun 08 '20 at 06:46
Note the `"=&m"` is allowed, but a `"0"(x)` matching constraint for it gets warnings. https://godbolt.org/z/4kKNq4. I think `+` operands are internally implemented as separate output and input operands with a matching constraint to make sure they pick the same location. If `"=&m"(x)` and `"m"(x)` are guaranteed to always pick the same memory, that would be safe. But probably in practice `"+m"(x)` is safe, if memory operands always pick the C object's permanent address, like it would pass if you did `func(&x)`. — Peter Cordes, Jun 08 '20 at 07:07
@DavidWohlferd I assume you're referring to "Therefore, this operand may not lie in a register that is read by the instruction or as part of any memory address.'' I can't think of any meaning of that sentence that's consistent with how the compiler is actually working. My best guess would be that it's just saying not to use `%eax` as an output if `(%eax)` is an input, but that doesn't seem to explain why it fails with `+&m`. My next-best guess would be that it means it can't be in memory at all, but `=&m` is accepted, which rules that out. — Joseph Sible-Reinstate Monica, Jun 08 '20 at 15:11

score 4 · Accepted Answer · answered Jun 19 '20 at 16:19

I think "+m" and "=m" are safe without an explicit &.

From the docs, my emphasis added:

&
Means (in a particular alternative) that this operand is an earlyclobber operand, which is written before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is read by the instruction or as part of any memory address.

Over-interpreting this could be problematic, but given the fact that it seems safe in practice, and there are good reasons why that should be the case, I think the following interpretation of the docs (i.e. guaranteed behaviour for GCC) is reasonable:

"Memory address" is talking about the addressing mode itself, e.g. something like 16(%rdx), that GCC invents and substitutes in for %1 if you have a "m"(foo) memory operand for example. It's not talking about early-clobbering pointed-to memory, only registers that might be read as part of the addressing mode.

It means GCC needs to avoid picking the same register in any addressing mode as it picked for an early-clobber register operand. This lets you safely use "m" operands (and +m or =m) in the same statement as an "=&r" operand, just like you can use "r" operands. It's the register output operand that needs to be flagged with &, not the potential readers.

The fact that it explicitly says in a register implies that this is only a concern at all for register operands, not memory.

In the C abstract machine, every object has a memory address (except register int foo).

I think compilers will always pick that address for "m" / "+m" operands, not some invented temporary. For example, I think it's safe / supported to lea that memory operand and store the address somewhere, if it would be safe to to tmp = &foo; in C.

You can think of "earlyclobber" as "don't pick the same location as any input operand". Since different objects have different addresses, that already happens for free for memory.

Unless you specified the same object for separate input and output operands, of course. In the register case for "=&r"(foo) and "r"(foo) you would get separate registers for the input and result. But not for memory, even if you use an early-clobber "=&m"(foo) operand, which does compile even though "+&m" doesn't.

Random facts, experiments on Godbolt:

"m"(y+1) doesn't work as an input: "memory input 1 is not directly addressable". But it works for a register. Memory source operands may have to be objects that exist in the C abstract machine.
"+&m"(x) doesn't compile: error: input operand constraint contains '&'

"=&m"(x) compiles cleanly. However, a "0"(x) matching constraint for it gets a warning: warning: matching constraint does not allow a register. https://godbolt.org/z/4kKNq4.

+ operands appear to be internally implemented as separate output and input operands with a matching constraint to make sure they pick the same location. (More evidence: if you use just one "+r" operand, you can reference %1 in the asm template without a warning, and it's the same register as %0.)

It appears that "=&m"(x) and "m"(x) will always pick the same memory anyway, even without a matching constraint. (For the same reason that it's not the same memory as any other object, which is why "+&m"(x) is redundant.)

If the lifetimes of two C objects overlap, their addresses will be distinct. So I think this works just like passing pointers to locals to a non-inline function, as far as the optimizer is concerned. It can't invent aliasing between them. e.g.

  int x = 1;
  {
    int tmp = x;     // dead after this call.
    foo(&x, &tmp);
  }

For example, the above code can't pass the same address for both operands of foo (e.g. by optimizing away tmp). Same for an inline-asm statement with "=m(x)" and "m"(tmp) operands. No early-clobber needed.

A lot of this reasoning is extrapolated from how one would reasonably expect it to work, but that is consistent with how it appears to work in practice and with the wording in the docs. I mention this as a caution against applying the same reasoning without any support from the docs for other cases.

Re: point 2: Even if early-clobber were necessary, it would always be satisfiable for memory. Every object has its own address. It's the programmer's fault if you pass overlapping union members as memory inputs and outputs. The compiler won't create that situation if it wasn't present in the source. e.g. it won't elide a temporary variable if it would mean that a memory input overlaps a memory output. (Or at all).

"I *think* compilers will always pick that address for `"m"` / `"+m"` operands, not some invented temporary." Probably worth mentioning that that's definitely not true with `"rm"`. In that case, it seems like clang will *always* pick memory and then invent a temporary: https://godbolt.org/z/QuYi2c — Joseph Sible-Reinstate Monica, Jun 19 '20 at 18:29
@JosephSible-ReinstateMonica: yuck. Even more reason to avoid that if you care about compiling with clang. Sometimes you can work around it with a multi-alternative `"r,m"` instead of `"rm"`, but it's apparently a longstanding known missed-optimization bug in clang that they don't see a clean way to fix. [clang (LLVM) inline assembly - multiple constraints with useless spills / reloads](https://stackoverflow.com/q/16850309) (I see you commented there so this isn't news to you, but maybe some future readers). Not sure where to mention that in this answer, if at all; suggestions welcome. — Peter Cordes, Jun 19 '20 at 20:32

Can I tell the compiler that I need to earlyclobber a memory operand?

1 Answers1