SSE: convert __m128 to float

Question

I have the following piece of C code:

__m128 pSrc1 = _mm_set1_ps(4.0f);
__m128 pDest;
int i;
for (i=0;i<100;i++) {
       m1 = _mm_mul_ps(pSrc1, pSrc1);      
       m2 = _mm_mul_ps(pSrc1, pSrc1);        
       m3 = _mm_add_ps(m1, m2);             
       pDest = _mm_add_ps(m3, m3); 
}

float *arrq = (float*) pDest;

Everything until the end of the for loop works. What I am trying to do now is to cast the __m128 type back to float. Since it stores 4 floats I thought I easily can cast it back to float*. What am I doing wrong? (This is a test code, so don't wonder). I basically tried all possible conversions I could think of. Thx for your help.

score 11 · Accepted Answer · edited Oct 19 '22 at 06:41

You can to use _mm_store_ps to store a __m128 vector into a float array.

alignas(16) float result [4];
_mm_store_ps (result, pDest);

// If result is not 16-byte aligned, use _mm_storeu_ps
// On modern CPUs this is just as fast as _mm_store_ps if
// result is 16-byte aligned, but works in all other cases as well
_mm_storeu_ps (result, pDest);

You can then access any / all elements from that temporary array, and if you're lucky the compiler will turn this into a shuffle instead of store/reload if that's more efficient. (If the destination isn't just a temporary and you actually want all 4 elements stored somewhere, then _mm_storeu_ps or store is exactly what you want.)

If you want just the low element, float _mm_cvtss_f32(__m128) is good.

If you want to combine the vector elements down to a single float after a loop that sums an array or does a dot-product, see Fastest way to do horizontal SSE vector sum (or other reduction)

Thanks alot. That was quite easy. I am now to the field, so sorry for the stupid question — , Jan 16 '13 at 20:53
[Watch out with stack variables though, `result` should be 16-byte aligned.](http://stackoverflow.com/questions/841433/gcc-attribute-alignedx-explanation) — user7116, Jan 16 '13 at 20:57

Aaron D. Marasco · Answer 2 · 2013-10-02T11:28:40.460

2

I believe casting works if you cast properly. I don't have the code in front of me, but I'm pretty sure this worked for me:

float *arrq = reinterpret_cast<float*>(&pDest);

Note that it uses a C++ cast describing what you are doing, and it is converting the address of it into a pointer.

edited Oct 02 '13 at 11:28

answered Oct 02 '13 at 11:10

Aaron D. Marasco

6,506
3
26
39

This is indeed the way to go if you want to avoid needless copying. Also many C++ coders should learn to use C++ casting. Though it's cumbersome to write (well, not really with a good editor and completion), it improves readability. – St0fF Aug 25 '16 at 10:17
This is strict aliasing undefined behaviour and may break in practice. At least pointing an `int*` onto a `__m256i` can break in practice: [GCC AVX \_\_m256i cast to int array leads to wrong values](https://stackoverflow.com/q/71364764) – Peter Cordes Oct 18 '22 at 17:25

manylegged · Answer 3 · 2022-10-24T14:01:31.943

1

You can also use _mm_cvtss_f32 to convert directly without touching memory which is convenient if you are only dealing with a few values. The _mm_storeu_ps answer is better if you are processing a whole array.

__m128 reg;
float val = _mm_cvtss_f32(reg);

edited Oct 24 '22 at 14:01

answered Oct 18 '22 at 17:13

manylegged

794
7
14

SSE: convert __m128 to float

3 Answers3