Convert m256d to m256i

Question

Since cast like this:

 __m256d a;

uint64_t t[4];

_mm256_store_si256( (__m256i*)t, (__m256i)a );/* Cast of 'a' to __m256i not allowed */

are not allowed when compiling under Visual Studio, I thought I could use some intrinsic functions to convert a __m256d value into a __m256i before passing it to _mm256_store_si256 and thus, avoiding the cast which causes the error.

But after looking on that list, I couldn't find a function taking for argument a __m256d value and returning a __256i value. So maybe you could help me writing my own function or finding the function I'm looking for, a function that stores 4x 64-bit double bit value to an array of 4x64-bit integers.

EDIT:

After further research, I found _mm256_cvtpd_epi64 which seems to be exactly what I want. But, my CPU doesn't support AVX512 instructions set...

What is left for me to do here?

@PaulR: I *think* the OP wants to store 4x 64-bit `double` bit-patterns to an array of 64-bit integers, without double->int conversion. — Peter Cordes, Jun 24 '18 at 16:24
How about using a srtuct instead of those Intel specific intrinsic type? — Biswapriyo, Jun 24 '18 at 16:25
There is no problem with the double->int conversion, that's no matter — Tom Clabault, Jun 24 '18 at 16:27
@PeterCordes: yes, you’re probably right - the question is not very clear. — Paul R, Jun 24 '18 at 16:29
I've not see the array. I assume that if you are using 256bit integer then it may be used with 4 x 64bit array or 4 member struct. — Biswapriyo, Jun 24 '18 at 16:33
I understand what you mean but I'm not sure if such a struct would do the job of replacing special Intel's intrinsic types used in arguments for the intrinsic functions. — Tom Clabault, Jun 24 '18 at 16:36
@Biswapriyo: A union like `union { uint64_t t[4]; __m256d vec; };` would be another option for type-punning with C, or for C++ with some compilers. But it turns out the OP isn't looking for type-punning, they want to *convert* to integer. — Peter Cordes, Jun 24 '18 at 16:42

Peter Cordes · Accepted Answer · 2018-06-24T20:26:32.803

You could use _mm256_store_pd( (double*)t, a). I'm pretty sure this is strict-aliasing safe because you're not directly dereferencing the pointer after casting it. The _mm256_store_pd intrinsic wraps the store with any necessary may-alias stuff.

(With AVX512, Intel switched to using void* for the load/store intrinsics instead of float*, double*, or __m512i*, to remove the need for these clunky casts and make it more clear that intrinsics can alias anything.)

The other option is to _mm256_castpd_si256 to reinterpret the bits of your __m256d as a __m256i:

alignas(32) uint64_t t[4];
_mm256_store_si256( (__m256i*)t,  _mm256_castpd_si256(a));

If you read from t[] right away, your compiler might optimize away the store/reload and just shuffle or pextrq rax, xmm0, 1 to extract FP bit patterns directly into integer registers. You could write this manually with intrinsics. Store/reload is not bad, though, especially if you want more than 1 of the double bit-patterns as scalar integers.

You could instead use union m256_elements { uint64_t u64[4]; __m256d vecd; };, but there's no guarantee that will compile efficiently.

This cast compiles to zero asm instructions, i.e. it's just a type-pun to keep the C compiler happy.

If you wanted to actually round packed double to the nearest signed or unsigned 64-bit integer and have the result in 2's complement or unsigned binary instead of IEEE754 binary64, you need AVX512F _mm256/512_cvtpd_epi64 (vcvtpd2qq) for it to be efficient. SSE2 + x86-64 can do it for scalar, or you can use some packed FP hacks for numbers in the [0..2^52] range: How to efficiently perform double/int64 conversions with SSE/AVX?.

BTW, storeu doesn't require an aligned destination, but store does. If the destination is a local, you should normally align it instead of using an unaligned store, at least if the store happens in a loop, or if this function can inline into a larger function.

@TomClabault: oh, so the edit to your question is wrong, and you *didn't* want to convert to integer with `_mm256_cvtpd_epi64` or a non-AVX512 equivalent? You just want to type-pun the `double` bit-patterns to `uint64_t`? If yes, I'll undo the duplicate-close. — Peter Cordes, Jun 24 '18 at 18:02

Convert __m256d to __m256i

1 Answers1

Convert m256d to m256i