1

How do I return in Assembler a 64bit value ? I tried this:

C-program:

#include <stdio.h>

double result=0;
double a = 10;
extern double func(double a);

 int main() {       
    result = func(a);
    printf("result: %f\n", result);     
    return 0;
   }

Assembly:

      section .bss
      x: resq 1

      section .text 

      global func

      func:

      movq qword[x],xmm0
      fld qword [x]
      fld qword [x]
      fadd
      movq xmm0,qword[x]

      ret

It should return 20.0 but instead it is always 10.0 What did I wrong?

falcon
  • 21
  • 1
  • 2
  • 3
    You never store the result. And btw, it would probably be simpler to use SSE instead of x87. – ElderBug Jun 09 '16 at 15:50
  • Elderbug is correct. `fadd` added st(0) and st(1) storing the value in `st(0)`. You don't update the value in [x] with the value at the top of the stack. You also don't pop the extra values off the FPU stack before returning (This will cause a problem overflowing the FPU stack if you call this function 5 times). As Elder points out you could use SSE instead of FPU unless the assignment given demanded you to do it this way. – Michael Petch Jun 09 '16 at 17:52
  • The code could also be simplified, but I am curious are you compiling for 32-bit or 64-bit code? I'm assuming from the result you got, your executable is intended to be 64-bit. – Michael Petch Jun 09 '16 at 17:55
  • With 64-bit code you can use the redzone on the stack for temporarily storing the value you intend to load onto the FPU stack. Code like this may work: `section .text global func func: movq [rsp-8],xmm0 fld qword [rsp-8] fadd qword [rsp-8] fstp qword [rsp-8] movq xmm0,[rsp-8] ret` . I use `fadd` with a 64bit memory operand with result in st(0), then use `fstp` to pop the value off FPU stack back into memory location and then copy that value to xmm0. `fstp` will pop the one and only value I loaded on the FPU stack in this example. – Michael Petch Jun 09 '16 at 18:14
  • 1
    If you could use SSE you can use the [ADDSD](http://www.felixcloutier.com/x86/ADDSD.html) instruction like this `section .text; global func; func: addsd xmm0, xmm0; ret` . That would add the scalar double in xmm0 to xmm0 storing the result in xmm0. – Michael Petch Jun 09 '16 at 18:27
  • The best thing you could do though is take your original code, and run it in a debugger like _GDB_ and watch what happens as you step through your function an instruction at a time. Using a debugger is a very good skill to adopt. – Michael Petch Jun 09 '16 at 18:33
  • The redzone exists on Linux, I may have incorrectly assumed your target platform. – Michael Petch Jun 09 '16 at 19:15
  • @MichaelPetch: I *think* Windows mangles C function names with an `_`, so this is probably Linux. falcon: See the [x86 tag wiki](http://stackoverflow.com/tags/x86/info) for links to calling conventions and other essential stuff. – Peter Cordes Jun 10 '16 at 02:10
  • 1
    @PeterCordes : Yes, my main point was an indirect way to suggest it always helps to tell us the target environment in a question. Not likely, but it would have been possible to avoid the underscore on GCC for Win32 by using the something like `-fno-leading-underscore` . I highly doubt he is, but anything is possible. – Michael Petch Jun 10 '16 at 02:15

2 Answers2

2

@Michael Petch noted that the whole function could be much more efficient with the following code:

addsd xmm0, xmm0   ; Add input parameter to itself
ret                ; Done!  (return values go in xmm0)

x86-64 passes/returns double in XMM registers, not memory or the x87 stack. (Applies to both the x86-64 System V ABI/calling convention, and Windows x64. See links in the x86 tag wiki)


The code posted didn't have comments. Commenting it would have helped the OP, so...

;; Buggy original version with comments
movq qword[x],xmm0  ; Store current value in memory  [Why?]
fld qword [x]       ; Load current value from memory [Why??]
fld qword [x]       ; Load current value from memory again
fadd                ; Add top two stack items

movq xmm0,qword[x]  ; reload original value from memory, unmodified

@ElderBug noted that the OP forgot to store the result of the fadd into memory before doing the final movq, so this function simply returns its input, like
double foo(double x) { return x; } but leaving garbage on the x87 stack.


@Michael Petch went on to note that the original code left a large amount of 'debris' on the floating point stack - there was no attempt to clean it up with various pop versions of the instructions (fstp, or faddp instead of fadd). This leaves less room for the next floating point function - until finally a floating-point stack overflow is caused, resulting in an unexpected NaN!

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
John Burger
  • 3,662
  • 1
  • 13
  • 23
2

You cannot mix up FPU and XMM calculations. When you calculate something on the FPU you must store it (as @Elderbug said) in memory and then you must load it to a XMM Register to return it on 64bit procs on x64 on a Win OS. There still can be an advantage of using FPU on 64bit Systems, cause the internal precision of the FPU can be 80bits (if you use the right FPU Control Word: bits 8,9 float32 (24-bit mantissa) = 00b double float (53-bit mantissa) = 10b extended precision (64bit mantissa) = 11b

If you want to use the FPU:

fld QWORD PTR x   ; laod var to FPU: into ST(0)  (MASM Syntax)
fadd ST(0), ST(0)   ; this adds [x]+[x]
fstp QWORD PTR x  ; store result back in var
movsd xmm0, QWORD PTR x

NOTE: for movsd always SSE2 is required. (On SSE1 machines a GP fault wil occur! See Intel® 64 and IA-32 Architectures Software Developer’s Manual: http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html However, if you run Windows8/8.1/10 that is never an issue for you, cause the OS itself requests SSE2 as system requirement.

EDIT: SSE2 is baseline in x86-x64 (as stated by Peter Cordes in the comments), so you can use it always on 64bit.

If you want to use SIMD with XMM registers:

movsd xmm0, QWORD PTR x
addsd xmm0, xmm0   ; this instruction also requires SSE2   
; ok, retun xmm0

Also note, that you also cannot mix up XMM and MMX-Registers! (The instructions MOVQ2DQ and MOVDQ2Q can convert them from one to the other but others can't)

If your function uses parameters and if it should run on a Windows operating system, you need to ensure a valid function prolog/epilog. see: https://future2048.blogspot.com

  • x86-64 has SSE2 as a baseline. Your SSE1 caveat only applies if you're doing this in 32-bit code. (And you probably shouldn't use a calling convention that returns doubles in xmm regs in code that might run on a machine without SSE2. But if you did want to with only SSE1, you could still use [`movlps`](http://www.felixcloutier.com/x86/MOVLPS.html), but don't because that would merge with the old contents of the register instead of zeroing the upper 64 bits. (false dependency on the old value of the xmm register)) – Peter Cordes Aug 26 '16 at 23:32
  • Also, re: initial setting of the x87 precision control: See [this one of Bruce Dawson's excellent series of articles about FP](https://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/). VC++'s CRT code sets the x87 unit to 53-bit mantissa precision, at least for 32-bit executables. I didn't re-read the whole article to see when it leaves it alone. And I think I've read something about directx changing x87 precision. So glad I don't use Windows. :) – Peter Cordes Aug 26 '16 at 23:36