4

I'm playing around with SIMD and wonder why there is no analogon to _mm_cvtsd_f64 to extrat the higher order floating point from a __m128d.

GCC 4.6+ has an extension which achieves this in a nice way:

__m128d a = ...;
double d1 = a[0];
double d2 = a[1];

But on older GCC (i.e 4.4.) the only way I could manage to get this is to define my own analogon function using __builtin_ia32_vec_ext_v2df, i.e.:

extern __inline double __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cvtsd_f64_h (__m128d __A)
{
  return __builtin_ia32_vec_ext_v2df (__A, 1);
}

__m128d a = ...;
double d1 = _mm_cvtsd_f64(a);
double d2 = _mm_cvtsd_f64_h(a);

Is this really the way to go? Is there any alternative that does not use potentially compiler-specific __builtin stuff? And again - why is there no _mm_cvtsd_f64_h or similar predefined?

This alternative I could come up with is much slower btw:

_mm_cvtsd_f64_h(__m128d __A) {
    double d[2];
    _mm_store_pd(d, __A);
    return d[1];
}
milianw
  • 5,164
  • 2
  • 37
  • 41
  • MOVHPD, _mm_storeh_pd() intrinsic ought to work. – Hans Passant Oct 14 '13 at 12:26
  • I tested Hans suggestion with both MS and gcc compilers and it generates the simplest code I believe. –  Oct 14 '13 at 13:19
  • I also just tested it in my code but _mm_storeh_pd seems to be a few percent slower, i.e. between the accepted answer below but faster than my bad _mm_cvtsd_f64_h approach above. – milianw Oct 19 '13 at 11:51

2 Answers2

3

I suggest that you use the following code:

inline static _mm_cvtsd_f64_h(__m128d x) {
    return _mm_cvtsd_f64(_mm_unpackhi_pd(x, x));
}

This is likely the fastest way to get get the upper half of xmm register, and it is compatible with MSVC/icc/gcc/clang.

Marat Dukhan
  • 11,993
  • 4
  • 27
  • 41
0

You can just use a union:

union {
    __m128d v;
    double a[2];
} U;

Assign your __m128d to U.v and read back U.a[0] or U.a[1]. Any decent compiler will optimise away redundant stores and loads.

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • Clear case of UB – Severin Pappadeux Jan 02 '18 at 05:50
  • @SeverinPappadeux: in theory, yes, but in practice it’s such a common usage that compilers tend to generate correct code for it (or optimise it away completely, where possible). – Paul R Jan 02 '18 at 07:25
  • @SeverinPappadeux: see also [this answer](https://stackoverflow.com/a/26012491/253056), and particularly [@mafso](https://stackoverflow.com/users/1741125/mafso)’s comment below it. – Paul R Jan 02 '18 at 07:40