GCC Inline Assembly side effects

Question

Could someone explain me (in other words) the following section from GCC doc:

Here is a fictitious sum of squares instruction, that takes two pointers to floating point values in memory and produces a floating point register output. Notice that x, and y both appear twice in the asm parameters, once to specify memory accessed, and once to specify a base register used by the asm. You won’t normally be wasting a register by doing this as GCC can use the same register for both purposes. However, it would be foolish to use both %1 and %3 for x in this asm and expect them to be the same. In fact, %3 may well not be a register. It might be a symbolic memory reference to the object pointed to by x.

asm ("sumsq %0, %1, %2"
 : "+f" (result)
 : "r" (x), "r" (y), "m" (*x), "m" (*y));

Here is a fictitious *z++ = *x++ * *y++ instruction. Notice that the x, y and z pointer registers must be specified as input/output because the asm modifies them.

asm ("vecmul %0, %1, %2"
 : "+r" (z), "+r" (x), "+r" (y), "=m" (*z)
 : "m" (*x), "m" (*y));

In the first example what is the point for listing *x and *y in the input operands? The same doc states:

In particular, there is no way to specify that input operands get modified without also specifying them as output operands.

In the second example why use the input operands section at all? None of its operands is used in the assembly statement anyway.

And as a bonus, how could one change the following example from this SO post so there was no need for the volatile keyword?

void swap_2 (int *a, int *b)
{
int tmp0, tmp1;

__asm__ volatile (
    "movl (%0), %k2\n\t" /* %2 (tmp0) = (*a) */
    "movl (%1), %k3\n\t" /* %3 (tmp1) = (*b) */
    "cmpl %k3, %k2\n\t"
    "jle  %=f\n\t"       /* if (%2 <= %3) (at&t!) */
    "movl %k3, (%0)\n\t"
    "movl %k2, (%1)\n\t"
    "%=:\n\t"

    : "+r" (a), "+r" (b), "=r" (tmp0), "=r" (tmp1) :
    : "memory" /* "cc" */ );
}

Thanks in advance. I'm struggling with this for two days now.

My guess is that `"m" (*x), "m" (*y)` was added as input operands to ensure the values for x and y were realized into memory before the extended assembly template was called. If you leave them off, and you pass the addresses via registers there is no guarantee that the code generator actually wrote the data to the memory for x and y (pointed two by the 2 registers). The `"m" (*x), "m" (*y)` input constraints ensures that the values for x and y are in memory before the inline assembly is executed. This scenario can occur with certain code structure and optimizations being on. — Michael Petch, Oct 25 '17 at 22:35
Mu comment above applies to the second example as well (vecmul). — Michael Petch, Oct 25 '17 at 22:38
This wouldn't be an issue if the fictitious instructions were allowed to take memory operands as parameters. — Michael Petch, Oct 25 '17 at 22:39
The swap code is very inefficient but technically the volatile modifier isn't even needed on it to begin with. In fact it would likely produce less efficient code with volatile being present when optimized and swap_2 was inlined into other functions. volatile isn't needed because all of the side effects of the assembly template are accounted for with the input, output, and clobber operands. — Michael Petch, Oct 25 '17 at 22:43
On a side note it is in fact possible to swap the data of two inputs with no code inside the assembly template and strictly use just the constraints to get the work done.That is demonstrated in this answer: https://stackoverflow.com/a/39499595/3857942 . That is more advanced use of inline assembly. — Michael Petch, Oct 25 '17 at 22:47
@Michael Petch Thanks for your informative comments. You should have posted them as an answer though. :) — listerreg, Oct 26 '17 at 10:33

score 4 · Accepted Answer · answered Oct 26 '17 at 08:52

4

In the first example, *x and *y have to be listed as input operands so that GCC knows that the outcome of the instruction depends on them. Otherwise, GCC could move stores to *x and *y past the inline assembly fragment, which would then access uninitialized memory. This can be seen by compiling this example:

double
f (void)
{
  double result;
  double a = 5;
  double b = 7;
  double *x = &a;
  double *y = &b;
  asm ("sumsq %0, %1, %2"
       : "+X" (result)
       : "r" (x), "r" (y) /*, "m" (*x), "m" (*y)*/);
  return result;
}

Which results in:

f:
    leaq    -16(%rsp), %rax
    leaq    -8(%rsp), %rdx
    pxor    %xmm0, %xmm0
#APP
# 8 "t.c" 1
    sumsq %xmm0, %rax, %rdx
# 0 "" 2
#NO_APP
    ret

The two leaq instructions just set up the registers to point into the uninitialized red zone on the stack. The assignments are gone.

The same is true for the second example as well.

I think you can use the same trick to eliminate the volatile. But I think it is not actually necessary here because there already is a "memory" clobber, which tells GCC that memory is read or written from inline assembly.

answered Oct 26 '17 at 08:52

Florian Weimer

32,022
3
48
92

In the swap code `volatile` isn't needed at all. All the side effects of the inline assembly are accounted for in the constraints. "memory" clobber will ensure that the data is realized into memory before the assembly template is executed. – Michael Petch Oct 26 '17 at 09:09
1

BTW, `volatile` *doesn't* mean you can omit the `"memory"` clobber. `asm volatile` simply means that it's not a pure function of the inputs, i.e. it needs to run as many times as the source says even if the outputs are unused, and not reorder with other `asm volatile`. – Peter Cordes Oct 26 '17 at 09:42
@Florian Weimer Thank you very much. Your example explains it nicely. Out of curiosity what options did you use to get this clean asm code? – listerreg Oct 26 '17 at 10:28
I deleted the irrelevant lines. But you can use `-fno-asynchronous-unwind-tables -O2 -S -o-` for pretty much the same effect. – Florian Weimer Oct 26 '17 at 10:31
@Florian Weimer `gcc -fno-stack-protector -fno-asynchronous-unwind-tables -O2 -S -o-` did the job. Thanks. – listerreg Oct 26 '17 at 10:42

GCC Inline Assembly side effects

1 Answers1