"=r"
is the constraint for GP integer registers.
The GCC manual claims that "=w"
is the constraint for an FP / SIMD register on AArch64. But if you try that, you get v0
not s0
, which won't assemble. I don't know a workaround here, you should probably report on the gcc bugzilla that the constraint documented in the manual doesn't work for scalar FP.
On Godbolt I tried this source:
float foo()
{
float sum;
#ifdef __aarch64__
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :); // AArch64
#else
asm volatile("fadds %0, s3, s4" : "=t"(sum) : :); // ARM32
#endif
return sum;
}
double dsum()
{
double sum;
#ifdef __aarch64__
asm volatile("fadd %0, d3, d4" : "=w"(sum) : :); // AArch64
#else
asm volatile("faddd %0, d3, d4" : "=w"(sum) : :); // ARM32
#endif
return sum;
}
clang7.0 (with its built-in assembler) requires the asm to be actually valid. But for gcc we're only compiling to asm, and Godbolt doesn't have a "binary mode" for non-x86.
# AArch64 gcc 8.2 -xc -O3 -fverbose-asm -Wall
# INVALID ASM, errors if you try to actually assemble it.
foo:
fadd v0, s3, s4 // sum
ret
dsum:
fadd v0, d3, d4 // sum
ret
clang produces the same asm, and its built-in assembler errors with:
<source>:5:18: error: invalid operand for instruction
asm volatile("fadd %0, s3, s4" : "=w"(sum) : :);
^
<inline asm>:1:11: note: instantiated into assembly here
fadd v0, s3, s4
^
On 32-bit ARM, =t"
for single works, but "=w"
for (which the manual says you should use for double-precision) also gives you s0
with gcc. It works with clang, though. You have to use -mfloat-abi=hard
and a -mcpu=
something with an FPU, e.g. -mcpu=cortex-a15
# clang7.0 -xc -O3 -Wall--target=arm -mcpu=cortex-a15 -mfloat-abi=hard
# valid asm for ARM 32
foo:
vadd.f32 s0, s3, s4
bx lr
dsum:
vadd.f64 d0, d3, d4
bx lr
But gcc fails:
# ARM gcc 8.2 -xc -O3 -fverbose-asm -Wall -mfloat-abi=hard -mcpu=cortex-a15
foo:
fadds s0, s3, s4 @ sum
bx lr @
dsum:
faddd s0, d3, d4 @ sum @@@ INVALID
bx lr @
So you can use =t
for single just fine with gcc, but for double
presumably you need a %something0
modifier to print the register name as d0
instead of s0
, with a "=w"
output.
Obviously these asm statements would only be useful for anything beyond learning the syntax if you add constraints to specify the input operands as well, instead of reading whatever happened to be sitting in s3 and s4.
See also https://stackoverflow.com/tags/inline-assembly/info