13

Consider this variable declaration:

union {
        struct {
            float x, y, z, padding;
        } components;
        __m128 sse;
    } _data;

My idea is to assign the value through x, y, z fields, perform SSE2 computations and read the result through x, y, z. I have slight doubts as to whether it is legal, though. My concern is alignment: MSDN says that __m128 variables are automatically aligned to 16 byte boundary, and I wonder if my union can break this behavior. Are there any other pitfalls to consider here?

Violet Giraffe
  • 32,368
  • 48
  • 194
  • 335
  • No, the alignment is not an issue. The union will have whatever alignment is needed for all its members to work correctly. – R. Martinho Fernandes Feb 23 '13 at 20:20
  • 2
    Also note that, at least on Visual Studio, you can get the components of `__m128 sse;` with `sse.m128_f32[0]`, `sse.m128_f32[1]`, `sse.m128_f32[2]`, `sse.m128_f32[3]`, so there's no need for this trick. – R. Martinho Fernandes Feb 23 '13 at 20:24
  • @R.MartinhoFernandes and for gcc? – BЈовић Feb 23 '13 at 20:54
  • For gcc, you can do even simpler: `sse[0]`, `sse[1]`, etc, although up to 4.7 that only works in C, and you need g++-4.8 to get that in C++. With a union, alignment and aliasing will be fine, but code quality will suck, better use `((float*)&sse)[1]`. – Marc Glisse Feb 23 '13 at 21:07
  • @MarcGlisse: Better why? And how will quality suck? It's not obvious to me, could you elaborate? – Violet Giraffe Feb 23 '13 at 21:36
  • 6
    As a side note, SSE datatypes aren't meant to be accessed this way. So there is a typically a significant performance penalty for doing so. Do this only when you're packing/unpacking data and you have a LOT of work to be done on the packed data. – Mysticial Feb 23 '13 at 22:00
  • @Mysticial: but is it even correct at all? – Violet Giraffe Feb 23 '13 at 22:02
  • Yes, it will be correct. – Mysticial Feb 23 '13 at 22:03
  • @Mysticial: thank you. I think this explains the performance hit I'm seeing in debug build after introducing the union instead of plain floats (no actual SSE computations yet). However, there is no performance difference in release build (with /O2). – Violet Giraffe Feb 23 '13 at 22:14
  • The compiler might be able to work around it if it sees what you're trying to do. SSE4.1 has direct support for accessing SIMD elements. So if the compiler is good enough, it will use them (if you allow it to use SSE4.1). – Mysticial Feb 23 '13 at 22:15
  • @VioletGiraffe "Better" just because gcc happens to suck at optimizing unions (it does a lot of useless copying), in theory it could produce the same code. – Marc Glisse Feb 24 '13 at 07:55

1 Answers1

6

The union's alignment should be fine, but in the case of Windows you may be able to access the 32 bit components directly. From xmmintrin.h (DirectXMath):

typedef union __declspec(intrin_type) _CRT_ALIGN(16) __m128 {
     float               m128_f32[4];
     unsigned __int64    m128_u64[2];
     __int8              m128_i8[16];
     __int16             m128_i16[8];
     __int32             m128_i32[4];
     __int64             m128_i64[2];
     unsigned __int8     m128_u8[16];
     unsigned __int16    m128_u16[8];
     unsigned __int32    m128_u32[4];
 } __m128;

As you can see, there's 4 floats in there. If you want to be uber paranoid, you can probably define all the same alignment specialities and such to make sure nothing will break. As far as I can see, however, and given that you mentioned MSDN in your answer, you should be all good to go. Both the union and accessing it directly should work if you know you have SSE compatible stuff. You can poke around the DirectXMath headers as well to get a feel for how Windows does the definitions and wrangling itself: they define a few macros as well depending on which instrinsics and capabilities are present at compile-time.

EDIT: As the R.MartinhoFernandes says in the comments, accessing it directly is probably way less of a headache than redefining it in a union.

  • 1
    I wanted to keep my bits of code cross-platfrom, hence the union trick. – Violet Giraffe Feb 23 '13 at 20:38
  • @VioletGiraffe Then the union should be just fine. GCC should respect the union as well and not do anything funky either, but I'm not GCC expert and I'm sure some Standardese Lawyer will come along and condemn us both to the deepest of hells for using `union`. –  Feb 23 '13 at 20:42