1

I know that in 32-bit assembly, one can compute a power (the equivalent of pow(double, double) in C) using a combination of the x87 instructions FYL2X, F2XM1 and FSCALE.

In 64-bit assembly, however, I read that the use of the x87 math coprocessor is deprecated and that the SSE2 instructions should be used instead. While I was able to find instructions like ADDSD, MULSD and DIVSD that operate on XMM registers, I was not able to find anything related to powers, exponentials or logarithms (the only close one I found was SQRTSD but it doesn't help too much) that could help me compute powers.

So how can powers (a^b where both a and b are floating-point) be computed using SSE2 instructions? Can it even be done or do you need to resort to software computation or the x87?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
DarkAtom
  • 2,589
  • 1
  • 11
  • 27
  • https://stackoverflow.com/questions/4431505/sse2-double-precision-log-function – Hans Passant Mar 06 '20 at 18:49
  • It's done in software, using a sequence of SSE and SSE2 instructions, and even then it's faster than the x87 FPU. – Iwillnotexist Idonotexist Mar 06 '20 at 18:50
  • @IwillnotexistIdonotexist How can I do the computation in software? I just want to make a power function from scratch, because I am only linking my project with the Win32 library for the I/O, so I don't have access to other libraries like the Microsoft C Runtime for using pow, but I am quite a noob when it comes to coding math functions. I am more interested in precision, rather than speed, though. – DarkAtom Mar 06 '20 at 18:56
  • 1
    @DarkAtom The easiest way to program your own `pow` is to use x87 instructions. It's not going to be much slower than a software implementation, but it's a lot easier to implement. Software implementations are kinda tricky as they need to be implemented very carefully to be precise over the whole range of floating point numbers. – fuz Mar 06 '20 at 19:06
  • The usual pure SSE2 software technique takes advantage of the exponential/logarithmic nature of binary floating point by manipulating the exponent field (using `psrld/q` for example), and a polynomial approximation for log(x) and exp(x) for the mantissa over the range x=0.5..1 or 0.5 .. 2. You can make a scalar version of any of these SIMD answers: [pow for SSE types](https://stackoverflow.com/q/25936031) / [Efficient implementation of log2(\_\_m256d) in AVX2](https://stackoverflow.com/q/45770089) – Peter Cordes Mar 06 '20 at 19:17
  • 1
    @fuz: It depends what kind of speed / precision tradeoff you're looking for. If you don't need to get close to 1 ulp (a mantissa bit) of precision, you don't need to be super careful. One advantage of rolling your own is that you get to choose that tradeoff for your use-case, and can for example ignore NaN or +-Inf inputs if that doesn't matter for your application. – Peter Cordes Mar 06 '20 at 19:21
  • Had to do this for a video game that had no floating point hardware (hooray Nintendo DS). ln(x) and exp(x) are actually pretty simple to code as series expansions. 3 or 4 iterations were plenty accurate for my needs, but you can run them deeper for more accuracy if needed. Works fixed point just fine too. – Michael Dorgan Mar 06 '20 at 20:16

0 Answers0