3

I would like to have a Vector class (which represents vector of 3 floats) implemented with SSE intrinsics (so I will not use the 4th elements of the __m128 type). But I would like to be able to access them easily like attributes : so myVector.x will access the 0-31 bits in vec, myVector.y will access the 32-63 bits in vec, but without having to call some getX() method. The 'x' attribute would be a sort of alias for the 0-31 bits of 'vec'. Is it possible ?

class Vector {  
public:  
  float x;  
  float y;  
  float z;  
private:  
  __m128 vec;  
}
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Guillaume
  • 33
  • 1
  • 3
  • 1
    It may be using union :) – Vyktor Feb 11 '12 at 16:17
  • 2
    Be careful with unions in this case: Microsoft says, you should not access `__m128` variables directly: http://msdn.microsoft.com/ru-ru/library/ayeb3ayc.aspx – Lol4t0 Feb 11 '12 at 16:21

3 Answers3

7

No, because this violates the strong aliasing rule.

Sure you can use casts or unions to pretend the __m128 is an array of floats, but the optimizer will not maintain coherency for you, because you're breaking the language's rules.

See What is the strict aliasing rule?

(According to the rule, access using a union is safe, but that only applies when you are naming the union. Taking a pointer or reference to a union member and then using the pointer or reference directly later is not safe.)

Community
  • 1
  • 1
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • How the `__m128` and the associated SSE instructions can be useful if you can't access the values of the floats contained in it ? – Guillaume Feb 11 '12 at 17:52
  • 1
    @Guillaume: SSE provides store instructions that extract the data to an array of float. See http://msdn.microsoft.com/en-US/library/ybhzf6dk.aspx You should use these store instructions to access components of the `__m128` value, and not try to access its memory directly. – Ben Voigt Feb 11 '12 at 17:54
  • Won't this destroy the efficiency gained by the use of the SSE instructions ? – Guillaume Feb 11 '12 at 18:17
  • 4
    @Guillaume: Design things so you don't convert to and from SSE more often than necessary. And actually the SSE load/store instructions are about the most efficient memory access method available. Still, SSE register access is faster, so try to keep things inside SSE as long as possible. – Ben Voigt Feb 11 '12 at 18:29
  • Shouldn't the answer be something like: "Yes, you can but you shouldn't because this would violate strong aliasing rule" ? – Maciej Szpakowski Feb 22 '16 at 02:40
1

You could perhaps use a union, something like

union data
{
    float[4] xyz;
    __m128 vec;
} aVec;

Then the floats would be aVec.xyz[0], aVec.xyz[1], and aVec.xyz[2] and the __m128 would be aVec.vec. The float array has four elements here, but nothing says you have to use the fourth one.

Ernest Friedman-Hill
  • 80,601
  • 10
  • 150
  • 186
  • 1
    @VJovic Are unions an exception to the strict aliasing rule? I'm using unions each time i need to access the same memory with different interpretation and its working fine with GCC. – Gigi Feb 11 '12 at 17:26
  • @VJovic Also i found this in the working draft of the c++ standard: *If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them*. – Gigi Feb 11 '12 at 17:41
  • Is the `union` construction something that _should_ be avoided, but which in fact _works_ with GCC, or bugs can really happen when the program is compiled with full optimization ? – Guillaume Feb 11 '12 at 17:55
  • @Gigi: I elaborated on that at the end of my answer. – Ben Voigt Feb 11 '12 at 17:57
  • Using g++, you can simply add `--fno-strict-aliasing` and this would work just fine. Given that __m128 isn't exactly part of the standard, maybe platform- and tool-specific answers are just fine here. – Ernest Friedman-Hill Feb 11 '12 at 21:24
0

You can write a struct which automatically converts to and from __m128:

struct alignas(16) Vec4f
{
    float x, y, z, w;
    operator __m128() const { return _mm_load_ps(&x);}
    Vec4f(__m128 const v) { _mm_store_ps(&x, v);}
};

This has the disadvantage that Vec4f would be passed via two SSE registers instead of one (when passed by value: https://godbolt.org/z/sutmuM).

Overall, I'd suggest to rather make a struct which just contains an __m128 and overload x(), y(), etc methods. Element-wise operations on SSE registers should be avoided anyway if possible (except using the zeroth element).

N.B.: alignas(16) requires C++11, there are compiler-specific alternatives for most compilers. Alternatively, you can use _mm_loadu_ps and _mm_storeu_ps instead.

chtz
  • 17,329
  • 4
  • 26
  • 56