I try to calculate arctangent of double precision float value, which I hold in xmm register. With normal float pointing it is possible to use old x87 instruction FPATAN, but how can I do this with double ?
Asked
Active
Viewed 1,160 times
2
-
1The old x87 floating point stack has higher precision than the xmm registers, so you can just do the calculation there. – MicroVirus Jun 01 '16 at 11:35
-
Then how can I move xmm value to that point stack?? fld xmm doesnt work at all – formateu Jun 01 '16 at 11:37
-
2To move values between `[x/y/z]mm`-registers and the x87-stack, you need to use memory (usually the program stack). – EOF Jun 01 '16 at 11:58
-
using a good library is the best way. A properly designed SSE library is much faster than x87, not counting the time to move data between blocks – phuclv Jun 01 '16 at 15:48
-
1["A decent vector math library should be ~10 or so times faster than the x87 transcendental functions (double precision) with errors on the order of ~1 to ~2 ulp."](https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/), ["It turns out that the most optimized math libraries on x86 use SSE software implementations for sin and cos that are faster than the hardware instructions on the FPU"](https://stackoverflow.com/questions/2284860/how-does-c-compute-sin-and-other-math-functions#comment2250614_2284952) – phuclv Jun 01 '16 at 15:54
-
http://scicomp.stackexchange.com/a/21588 – phuclv Jun 01 '16 at 15:58
1 Answers
5
You can still copy data from xmm to x87 to use instructions like fpatan
, but usually you should call a math library function. (fpatan
is so slow that replacing it with many simple instructions is still good.) Wikipedia suggests looking at Netlib for a freely redistributable C implementation. (Obviously the easiest way is to just call the function in libm on whatever system you're using.)
If you are going to do it, don't use static storage for the memory you bounce through; use a temporary on the stack.
Also note that fpatan
takes 2 inputs, because it implements the atan2
library function, giving a result in the appropriate quadrant depending on the sign of both inputs.
; assuming you did sub rsp, 24 or something earlier in your function
movsd [rsp], xmm1
fld qword [rsp] ; st0 = xmm1
movsd [rsp], xmm0
fld qword [rsp] ; st0 = xmm0, st1 = xmm1
fpatan ; st0 = arctan(xmm1/xmm0)
fstp qword [rsp] ; x87 stack is empty again
movsd xmm0, [rsp] ; xmm0 = arctan(xmm1/xmm0)
; and then add rsp, 24 at some point before returning

Peter Cordes
- 328,167
- 45
- 605
- 847
-
Or in the x86-64 System V ABI, use `[rsp - 8]` or whatever space in the red-zone isn't in use. – Peter Cordes Dec 16 '22 at 07:52