movq [rbp],xmm0
overwrites the saved RBP value that enter
pushed. This would be more obvious if you hadn't used enter
, but [rbp+0]
is not an address you can use in a function with a stack frame.
([rbp-8]
is the highest address you can use for locals. [rsp]
would have worked, because you decremented RSP after enter
set RBP=RSP, but you used RBP.)
When execution returns to main
, gcc -O0
(anti-optimized for debugging) runs these instructions to store the function return value from xmm0
into stack space for d_2
instead of just passing it directly to printf
while it's still in a register:
movq rax,xmm0
mov QWORD PTR [rbp-0x8],rax # Using RBP after you clobbered it.
Un-optimized gcc output is really silly: copying FP data to an integer register instead of storing directly with movsd
makes no sense. But that's not the issue.
RBP
holds the IEEE double
precision bit-pattern for 1.22
(0x3ff3851eb851eb85
) because that's what your func
clobbered it with.
The address rbp-8
is not canonical: the high 16 bits don't match bit 47, so it's not a sign-extended 48-bit virtual address. (See this ASCII-art diagram).
Using a non-canonical address on current x86-64 hardware generates a #GP(0)
exception (according to Intel's manual entry for mov
), and Linux maps this x86 exception to SIGBUS.
This is why you get a bus error instead of the usual segmentation fault for trying to access unmapped memory with a bogus pointer.
Your code is over-complicated and wrong
In both mainstream x86-64 calling conventions (Linux/OS X use x86-64 System V), double
is returned in xmm0
. Use addsd xmm0,xmm0
/ ret
like a normal person, like the answer on the question you linked shows.
func:
addsd xmm0,xmm0 ; first FP arg in (low 64 bits of) xmm0
ret ; return value in (low 64 bits of) xmm0
Or if you insist on x87, then look how much code you have to write:
func:
movsd [rsp-8], xmm0 ; double arg in xmm0
fld qword [rsp-8]
fadd st0, st0 ; use x87 regs instead of uselessly loading twice.
fstp qword [rsp-8] ; empty the x87 stack
movsd xmm0, [rsp-8] ; return value in xmm0
ret
That's using 8 bytes below RSP as scratch space, in the red-zone to store/reload to get data between SSE2 registers and x87, because the x86-64 calling conventions are designed around SSE2, using xmm registers. Use sub rsp, 8
/ add rsp, 8
if you don't want to use the red-zone.
Don't use x87 in x86-64 unless you need 80-bit floating-point precision.
(enter
is slow and not recommended; make a stack frame with push rbp
/ mov rbp,rsp
if you want one. leave
is fine, though. Making a stack frame is optional; I left that out.)
printf
doesn't need "%lf"
to print a double
, only scanf
needs lf
. You can't printf
a single-precision float, because C default promotion rules apply to args of variadic functions, and thus any float
is promoted to double
.
In most C implementations (including glibc), "%lf"
works anyway, silently ignoring the meaningless l
modifier on the %f
conversion.
I mention this in case you try to do that with call printf
with a "%f"
format string from asm later, and run into How to print a single-precision float with printf.