1

Since cast like this:

 __m256d a;

uint64_t t[4];

_mm256_store_si256( (__m256i*)t, (__m256i)a );/* Cast of 'a' to __m256i not allowed */

are not allowed when compiling under Visual Studio, I thought I could use some intrinsic functions to convert a __m256d value into a __m256i before passing it to _mm256_store_si256 and thus, avoiding the cast which causes the error.

But after looking on that list, I couldn't find a function taking for argument a __m256d value and returning a __256i value. So maybe you could help me writing my own function or finding the function I'm looking for, a function that stores 4x 64-bit double bit value to an array of 4x64-bit integers.

EDIT:

After further research, I found _mm256_cvtpd_epi64 which seems to be exactly what I want. But, my CPU doesn't support AVX512 instructions set...

What is left for me to do here?

  • @PaulR: I *think* the OP wants to store 4x 64-bit `double` bit-patterns to an array of 64-bit integers, without double->int conversion. – Peter Cordes Jun 24 '18 at 16:24
  • How about using a srtuct instead of those Intel specific intrinsic type? – Biswapriyo Jun 24 '18 at 16:25
  • There is no problem with the double->int conversion, that's no matter – Tom Clabault Jun 24 '18 at 16:27
  • @PeterCordes: yes, you’re probably right - the question is not very clear. – Paul R Jun 24 '18 at 16:29
  • @Biswapriyo What do you mean by using a struct? – Tom Clabault Jun 24 '18 at 16:29
  • I've not see the array. I assume that if you are using 256bit integer then it may be used with 4 x 64bit array or 4 member struct. – Biswapriyo Jun 24 '18 at 16:33
  • I understand what you mean but I'm not sure if such a struct would do the job of replacing special Intel's intrinsic types used in arguments for the intrinsic functions. – Tom Clabault Jun 24 '18 at 16:36
  • @Biswapriyo: A union like `union { uint64_t t[4]; __m256d vec; };` would be another option for type-punning with C, or for C++ with some compilers. But it turns out the OP isn't looking for type-punning, they want to *convert* to integer. – Peter Cordes Jun 24 '18 at 16:42

1 Answers1

3

You could use _mm256_store_pd( (double*)t, a). I'm pretty sure this is strict-aliasing safe because you're not directly dereferencing the pointer after casting it. The _mm256_store_pd intrinsic wraps the store with any necessary may-alias stuff.

(With AVX512, Intel switched to using void* for the load/store intrinsics instead of float*, double*, or __m512i*, to remove the need for these clunky casts and make it more clear that intrinsics can alias anything.)

The other option is to _mm256_castpd_si256 to reinterpret the bits of your __m256d as a __m256i:

alignas(32) uint64_t t[4];
_mm256_store_si256( (__m256i*)t,  _mm256_castpd_si256(a));

If you read from t[] right away, your compiler might optimize away the store/reload and just shuffle or pextrq rax, xmm0, 1 to extract FP bit patterns directly into integer registers. You could write this manually with intrinsics. Store/reload is not bad, though, especially if you want more than 1 of the double bit-patterns as scalar integers.

You could instead use union m256_elements { uint64_t u64[4]; __m256d vecd; };, but there's no guarantee that will compile efficiently.


This cast compiles to zero asm instructions, i.e. it's just a type-pun to keep the C compiler happy.

If you wanted to actually round packed double to the nearest signed or unsigned 64-bit integer and have the result in 2's complement or unsigned binary instead of IEEE754 binary64, you need AVX512F _mm256/512_cvtpd_epi64 (vcvtpd2qq) for it to be efficient. SSE2 + x86-64 can do it for scalar, or you can use some packed FP hacks for numbers in the [0..2^52] range: How to efficiently perform double/int64 conversions with SSE/AVX?.


BTW, storeu doesn't require an aligned destination, but store does. If the destination is a local, you should normally align it instead of using an unaligned store, at least if the store happens in a loop, or if this function can inline into a larger function.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • _mm256_castpd_si256 seems to do the job. Thanks – Tom Clabault Jun 24 '18 at 17:31
  • @TomClabault: oh, so the edit to your question is wrong, and you *didn't* want to convert to integer with `_mm256_cvtpd_epi64` or a non-AVX512 equivalent? You just want to type-pun the `double` bit-patterns to `uint64_t`? If yes, I'll undo the duplicate-close. – Peter Cordes Jun 24 '18 at 18:02
  • Yes, that's no conversion but type-pun from double to int – Tom Clabault Jun 24 '18 at 18:46