How to compute sine values somewhere, and then move then into XMM0 in Assembly?

Question

I was doing the integration task with FPU before, now I'm struggling with SSE.

My main problem is when I was using FPU stack, there was the fsin function, which could be used on the number, which is at the top of the stack (st0).

Now I want to calculate the sine of my all four numbers in XMM0, or calculate it somewhere else and move into XMM0. I'm using the AT&T syntax.

I think the second idea is actually possible, but I don't know how :)

Does anybody know how to do it?

sinus? I don't think that means what you think it does (and it's not a verb). — Mahmoud Al-Qudsi, May 13 '12 at 10:25
This answer is relevant: http://stackoverflow.com/a/1845204/1256624 (Summary, SSE doesn't appear to provide a native `sin` instruction). Also, this page looks like it might help: http://gruntthepeon.free.fr/ssemath/ — huon, May 13 '12 at 11:06
@dbaupp I know that SSE doesn't provie it, but maybe you know how to insert values from fpu stack into xmm0? — pawel, May 13 '12 at 11:20
Google turns up [this](http://www.asmcommunity.net/board/index.php?topic=30778.0). (The second link I provided up above appears to have implementations of `sin`/`cos`/etc in SSE; these may even be more performant, due to vectorization and SSE generally being better etc.) — huon, May 13 '12 at 11:27
The fsin (etc) instructions were pretty bad anyway. Only useful when optimizing for size - and in that case you probably won't be using SSE. This may be useful: http://devmaster.net/forums/topic/4648-fast-and-accurate-sinecosine/ (add reduction if you're outside the range) — harold, May 13 '12 at 12:46

score 4 · Accepted Answer · answered May 13 '12 at 12:46

Three options:

Use and existing library that computes sin on SSE vectors.
Write your own vector sin function using SSE.
Store the vector to memory, use fsin to compute the sine of each element, and load the results. Assuming that your stack is 16-byte aligned and has 16-bytes of space, something like this:
```
   movaps  %xmm0, (%rsp)
   mov     $3,     %rcx
0: flds   (%rsp,%rcx,4)
   fsin
   fstps  (%rsp,%rcx,4)
   sub     $1,     %rcx
   jns     0b
```

(1) is almost certainly your best bet performance-wise, and is also the easiest. If you have significant experience writing vector code and know a priori that the arguments fall into some range, you may be able to get better performance with (2). Using fsin will work, but it's ugly and slow and not particularly accurate, if that matters.

How to compute sine values somewhere, and then move then into XMM0 in Assembly?

1 Answers1