12

I would like to know if I'm breaking strict aliasing rules with this snippet. (I think so since it's dereferencing a punned-pointer, however it's done in a single expression and /Wall doesn't cry.)

inline double plop() const // member function
{
    __m128d x = _mm_load_pd(v);
    ... // some stuff
    return *(reinterpret_cast<double*>(&x)); // return the lower double in xmm reg referred to by x.
}

If yes, what's the workaround? Using different representations simultaneously is becoming hardcore once you want to respect the spec.

Thanks for your answers, I'm losing my good mood trying to find a solution.

Answers that won't be accepted and why:

"use mm_store" -> The optimizer fails to remove it if the following instructions require an xmm register so it generates a load just after it. Store + load for nothing.

"use a union" -> Aliasing rule violation if using the two types for the same object. If I understood well the article written by Thiago Macieira.

famousgarkin
  • 13,687
  • 5
  • 58
  • 74
PixelRick
  • 149
  • 1
  • 8
  • What about plain old `memcpy` into a `double`? – Praetorian Apr 17 '14 at 18:16
  • 1
    It's almost impossible to avoid aliasing when dealing with SIMD. Ideally you avoid accessing individual elements like you are right now, but if you absolutely need to, I recommend a union for things on the stack and a pointer cast for pointers coming from parameters. Union type-punning is explicitly allowed in C99, and all mainstream compilers will carry it over to C++ as well. Trying to be completely standard compliant when dealing with a non-standard extension is to some extent, self-contradictory in the first place. – Mysticial Apr 17 '14 at 18:20
  • @Praetorian : isn't using simd intrinsics and calling memcpy kinda paradoxal ? ^^ – PixelRick Apr 17 '14 at 18:23
  • @Mysticial : I'm accessing the lower only to get the result so it's fine for this particular example. I can't use a union since my vec4 is like that : { union {T x; T r;}; union {T y; T g;}; ... }. Anonymous structs are not allowed so I'm pretty stuck. The main problem here is the return statement (if inlined) is not atomic so it may be dereferenced at the wrong moment... – PixelRick Apr 17 '14 at 18:30
  • 1
    Are you saying that you also have concurrency problems? If so you'll want to fix that first. IIRC, C++11 specifies that any race condition to the same object involving at least one write is UB. – Mysticial Apr 17 '14 at 18:36
  • 2
    @Mysticial "Union type-punning is explicitly allowed in C99, and all mainstream compilers will carry it over to C++ as well." Even when it is supported it may not generate good code: http://blog.regehr.org/archives/959. Also IIRC there are compilers that don't support it (I think Solaris CC was one example). It's better to simply use memcpy. – bames53 Apr 17 '14 at 19:40
  • I think you should still use `_mm_storel_pd` and just find a workaround for the optimizer bug. Insert no-ops or something. – bames53 Apr 17 '14 at 19:56

4 Answers4

4

There is only one intrinsic that "extracts" the lower order double value from xmm register:

double _mm_cvtsd_f64 (__m128d a)

You could use it this way:

return _mm_cvtsd_f64(x);

There is some contradiction between different references. MSDN says: This intrinsic does not map to any specific machine instruction. While Intel intrinsic guide mentions movsd instruction. In latter case this additional instruction is easily eliminated by optimizer. At least gcc 4.8.1 with -O2 flag generates code with no additional instruction.

Evgeny Kluev
  • 24,287
  • 7
  • 55
  • 98
  • Intel says it doesn't map either :) This is the way to rely on compiler impl to respect aliasing rules. – PixelRick Apr 22 '14 at 14:52
3

The bullet point in bold should i think allow your cast here, as we may consider __m128d as an aggregate of four double union to the full register. In regards to strict aliasing, compiler had always be very conciliate around union where at the origin, only a cast to (char*) was supposed valid.

§3.10: If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined (The intent of this list is to specify those circumstances in which an object may or may not be aliased):

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.
Mysticial
  • 464,885
  • 45
  • 335
  • 332
galop1n
  • 8,573
  • 22
  • 36
  • Really nice answer I wasn't taking into account that no compiler actually use this type as a keyword but instead as a specific type being either an explicitly compatible type typedef or a union including a compatible representation... let me do some checks and I'll accept your answer with all my gratitude. – PixelRick Apr 18 '14 at 01:39
  • This is the second best solution but it requires re-wrapping every SIMD types since some compilers have weird special representations not including the double array. – PixelRick Apr 22 '14 at 14:49
  • MSVC defines `__m128d` as an aggregate (a union specifically), but other compilers don't. (MSVC doesn't enforce strict-aliasing so this code is definitely safe there). **It's not safe in general to point a `double *` into a `__m128d`, only [vice versa](https://stackoverflow.com/q/52112605))**; This does break in practice, for example [GCC AVX \_m256i cast to int array leads to wrong values](https://stackoverflow.com/q/71364764) – Peter Cordes May 17 '22 at 13:45
0

Yes, I think this breaks strict aliasing. However, in practice this is usually fine.
(I'm mostly writing this as an answer because It's difficult to describe well in a comment)

But, you could instead do something like this:

inline double plop() const // member function
{
    __m128d x = _mm_load_pd(v);
    ... // some stuff

    union {
        unsigned long long i; // 64-bit int
        double             d; // 64-bit double
    };

    i = _mm_cvtsi128_si64(_mm_castpd_si128(x)); // _mm_castpd_si128 to interpret the register as an int vector, _mm_cvtsi128_si64 to extract the lowest 64-bits

    return d; // use the union to return the value as a double without breaking strict aliasing
}
Apriori
  • 2,308
  • 15
  • 16
  • According to the standard, after one member of a union is assigned to, the value of all other members becomes unspecified. Unions cannot be portably used to reinterpret bit patterns. – Sneftel Apr 18 '14 at 15:06
  • @Sneftel: This is [Implementation Defined](http://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html) behavior, the implementation is required to define the behavior. In the case of GCC (and every compiler I've ever used) It does not break strict aliasing. You can read how the behavior is defined [here](http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type%2dpunning). – Apriori Apr 18 '14 at 17:19
  • The standard describes it as "unspecified", not "implementation-defined". That is, implementations are free to *not* define it. – Sneftel Apr 18 '14 at 17:24
  • 1
    @Sneftel: According to the first link above, "Some areas are only implementation-defined in one version of the standard." – Apriori Apr 18 '14 at 17:48
  • 1
    Huh, I wasn't aware of that difference between the versions. It still falls short of portable, but I agree that it is safe in practical terms. (But then again, so is just casting the pointer.) Thanks for the heads-up! – Sneftel Apr 18 '14 at 18:18
  • @Sneftel: No problem. Yeah, honestly in practice I pointer-cast because it is safe in practice, and it IMO easier to read the source code. I mostly use unions on in forums such as this so I can defend some shred of defined behavior. – Apriori Apr 18 '14 at 18:47
  • The code in the question is a strict-aliasing violation that can break with compilers other than MSVC. This answer is safe but way over-complicated vs. using `_mm_cvtsd_f64(__m128d)` directly! If you want both doubles, just `_mm_store_ps` into `alignas(16) double tmp[2];`, and compilers will often optimize that into a shuffle. Or use `_mm_storeh_pd` to store the high half somewhere. (Compilers may optimize away a store/reload if you reuse it right away; but if that's the case use casts around `_mm_movehl_ps` or use `_mm_unpackhi_pd` to shuffle within a register, then cvt to scalar.) – Peter Cordes May 17 '22 at 13:53
0

What about return x.m128d_f64[0]; ?

  • 1
    This assumes an implementation where `__m128d ` has members that can be accessed like this. Not all implementations do. For example: http://clang.llvm.org/doxygen/emmintrin_8h_source.html, where `__m128d` is defined as `typedef double __m128d __attribute__((__vector_size__(16)));` – bames53 Apr 17 '14 at 19:48
  • A compiler dependent macro maybe an idea if all compilers have a way to express an access to elements, using m128d_f64 for msvc, directly the [] operator on clang etc ... that may be an idea to be sure the optimizer part of the compiler won't be lost or prevented from doing optimizations.. – PixelRick Apr 18 '14 at 01:34