I use C and I want to apply some AVX2 code on 4 doubles. The operations are like these (per double):
- Access the "second 4 bytes" of the double as an int32 (something like that:
((union { double a; int32_t b[2]; }) {.a = XXX}).b[1]
whereXXX
is the input double) - Subtract an int32 constant
c
from our int32 - Cast the int32 to a double
- Multiply the double with some number
z
I've tried to implement that by doing this:
- Read input vector
in
(unaligned doubles) and cast it to a int32 vector - Load the constant
c
to a vector - Subtract
c
from the int32 vector - Convert int32 values to double: I was not able to do that :(
- Multiply doubles: Not done, but that should be trivial
My current code is roughly this:
__m256i x = (__m256i)_mm256_loadu_pd(in);
const __m256i c = _mm256_set1_epi32(1234);
__m256i y = _mm256_sub_epi32(x, c); // we only care about every second value of our array; maybe that can be made more efficient?
// tried to shuffle values so that the important int32 values are at the beginning. Maybe then casting can be done?
//__m256i z = _mm256_shuffle_epi32(y, _MM256_SHUFFLE(0, 2, 4, 6, 1, 3, 5, 7));
Maybe someone has an idea how I can cast the four int32 values to four double values? Also if there's a magic instruction that you know and which can improve other parts, please let me know.
Thanks a lot