0

I'm trying to write a program exactly like this. The only difference is that I'm using the stack memory instead of the .bss section to hold the value I'm getting from the calling function. After this change, I'm getting Bus Error when returning from the assembly function.

Any ideas why?

C-program:

#include<stdio.h>
extern double func(double d);
int main(){
  double d_1 = 1.22;
  double d_2 = func(d_1);
  printf("%lf\n", d_2);
  return 0;
}

Assembly:

section .text
global func
func:
enter   0,0
sub     rsp, 8
movq    qword[rbp],xmm0  ; Store current value in memory  
fld     qword[rbp]       ; Load current value from memory
fld     qword[rbp]       ; Load current value from memory again
fadd                     ; Add top two stack items
leave
ret 
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Rotem Barak
  • 154
  • 1
  • 9
  • What instruction exactly do you get the bus error on? What values are in registers? Also, in both mainstream x86-64 calling conventions, `double` is returned in `xmm0`. Use `addsd xmm0,xmm0` / `ret` like a normal person. Or if you insist on x87, then `fld` / `fadd st0, st0` – Peter Cordes Apr 18 '18 at 07:42
  • I didn't have a clue about the addsd thing so this is a great help :-) What excatly it does? And the error occures when trying to pass $rax to a local variable in main. namely: mov %rax,-0x8(%rbp) – Rotem Barak Apr 18 '18 at 07:44
  • Also, `enter` is slow; compilers avoid it for a reason, and just use `push rbp` / `mov rbp,rsp` if they don't omit the stack frame entirely and just use registers or stack space relative to RSP.. `enter` is 12 uops on Skylake, with a throughput of one per 8 cycles (http://agner.org/optimize/), vs. push+mov only being 2 uops with a tiny throughput impact. But if you *are* going to use `enter` (for code-size reasons?), use a non-zero immediate so you don't also need `sub rsp, 8` right after. – Peter Cordes Apr 18 '18 at 07:48
  • Use `si` step *into* functions, not over them. GDB will tell you what address RIP had when a signal was received. (`addsd` is mentioned in an answer on the question you linked. Look at any x86 instruction reference manual, like http://felixcloutier.com/x86/index.html or other links in https://stackoverflow.com/tags/x86/info. – Peter Cordes Apr 18 '18 at 07:49
  • If you're programming in NASM, you should use GDB's `set disassembly-flavor intel` to get GAS / binutil's MASM-like syntax. See debugging tips at the bottom of the tag wiki. – Peter Cordes Apr 18 '18 at 07:57

1 Answers1

1

movq [rbp],xmm0 overwrites the saved RBP value that enter pushed. This would be more obvious if you hadn't used enter, but [rbp+0] is not an address you can use in a function with a stack frame.

([rbp-8] is the highest address you can use for locals. [rsp] would have worked, because you decremented RSP after enter set RBP=RSP, but you used RBP.)


When execution returns to main, gcc -O0 (anti-optimized for debugging) runs these instructions to store the function return value from xmm0 into stack space for d_2 instead of just passing it directly to printf while it's still in a register:

movq   rax,xmm0
mov    QWORD PTR [rbp-0x8],rax    # Using RBP after you clobbered it.

Un-optimized gcc output is really silly: copying FP data to an integer register instead of storing directly with movsd makes no sense. But that's not the issue.


RBP holds the IEEE double precision bit-pattern for 1.22 (0x3ff3851eb851eb85) because that's what your func clobbered it with.

The address rbp-8 is not canonical: the high 16 bits don't match bit 47, so it's not a sign-extended 48-bit virtual address. (See this ASCII-art diagram).

Using a non-canonical address on current x86-64 hardware generates a #GP(0) exception (according to Intel's manual entry for mov), and Linux maps this x86 exception to SIGBUS.

This is why you get a bus error instead of the usual segmentation fault for trying to access unmapped memory with a bogus pointer.


Your code is over-complicated and wrong

In both mainstream x86-64 calling conventions (Linux/OS X use x86-64 System V), double is returned in xmm0. Use addsd xmm0,xmm0 / ret like a normal person, like the answer on the question you linked shows.

func:
    addsd   xmm0,xmm0   ; first FP arg in (low 64 bits of) xmm0
    ret                 ; return value in (low 64 bits of) xmm0

Or if you insist on x87, then look how much code you have to write:

func:
    movsd  [rsp-8], xmm0      ; double arg in xmm0
    fld    qword [rsp-8]
    fadd   st0, st0           ; use x87 regs instead of uselessly loading twice.
    fstp   qword [rsp-8]      ; empty the x87 stack
    movsd  xmm0, [rsp-8]      ; return value in xmm0
    ret

That's using 8 bytes below RSP as scratch space, in the to store/reload to get data between SSE2 registers and x87, because the x86-64 calling conventions are designed around SSE2, using xmm registers. Use sub rsp, 8 / add rsp, 8 if you don't want to use the red-zone.

Don't use x87 in x86-64 unless you need 80-bit floating-point precision.

(enter is slow and not recommended; make a stack frame with push rbp / mov rbp,rsp if you want one. leave is fine, though. Making a stack frame is optional; I left that out.)


printf doesn't need "%lf" to print a double, only scanf needs lf. You can't printf a single-precision float, because C default promotion rules apply to args of variadic functions, and thus any float is promoted to double.

In most C implementations (including glibc), "%lf" works anyway, silently ignoring the meaningless l modifier on the %f conversion.

I mention this in case you try to do that with call printf with a "%f" format string from asm later, and run into How to print a single-precision float with printf.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847