2

I try to calculate arctangent of double precision float value, which I hold in xmm register. With normal float pointing it is possible to use old x87 instruction FPATAN, but how can I do this with double ?

formateu
  • 157
  • 2
  • 17
  • 1
    The old x87 floating point stack has higher precision than the xmm registers, so you can just do the calculation there. – MicroVirus Jun 01 '16 at 11:35
  • Then how can I move xmm value to that point stack?? fld xmm doesnt work at all – formateu Jun 01 '16 at 11:37
  • 2
    To move values between `[x/y/z]mm`-registers and the x87-stack, you need to use memory (usually the program stack). – EOF Jun 01 '16 at 11:58
  • using a good library is the best way. A properly designed SSE library is much faster than x87, not counting the time to move data between blocks – phuclv Jun 01 '16 at 15:48
  • 1
    ["A decent vector math library should be ~10 or so times faster than the x87 transcendental functions (double precision) with errors on the order of ~1 to ~2 ulp."](https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/), ["It turns out that the most optimized math libraries on x86 use SSE software implementations for sin and cos that are faster than the hardware instructions on the FPU"](https://stackoverflow.com/questions/2284860/how-does-c-compute-sin-and-other-math-functions#comment2250614_2284952) – phuclv Jun 01 '16 at 15:54
  • http://scicomp.stackexchange.com/a/21588 – phuclv Jun 01 '16 at 15:58

1 Answers1

5

You can still copy data from xmm to x87 to use instructions like fpatan, but usually you should call a math library function. (fpatan is so slow that replacing it with many simple instructions is still good.) Wikipedia suggests looking at Netlib for a freely redistributable C implementation. (Obviously the easiest way is to just call the function in libm on whatever system you're using.)


If you are going to do it, don't use static storage for the memory you bounce through; use a temporary on the stack.

Also note that fpatan takes 2 inputs, because it implements the atan2 library function, giving a result in the appropriate quadrant depending on the sign of both inputs.

; assuming you did  sub  rsp, 24   or something earlier in your function

movsd   [rsp], xmm1
fld     qword [rsp]   ; st0 = xmm1
movsd   [rsp], xmm0
fld     qword [rsp]   ; st0 = xmm0,  st1 = xmm1

fpatan                ; st0 = arctan(xmm1/xmm0)

fstp    qword [rsp]     ; x87 stack is empty again
movsd   xmm0, [rsp]   ; xmm0 = arctan(xmm1/xmm0)

; and then   add rsp, 24   at some point before returning
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847