Thanks in advance for the help. I need to be able to perform the following shuffle pattern in an array with uint16_t data. My unprocessed array will look like the following
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
I have transformed my unprocessed data into the format below with _mm512_permutexvar_epi16
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3
and then store the contents of the AVX register into 4 different arrays, this is the part I'm unsure on the best way to do.
next eight values of arrayofZero's 0 0 0 0 0 0 0 0
next eight values of arrayofOne's 1 1 1 1 1 1 1 1
next eight values of arrayofTwo's 2 2 2 2 2 2 2 2
next eight values of arrayofThree's 3 3 3 3 3 3 3
I need to loop through my unprocessed data and populate the arrayofZero's with all the 0 values and so on and so forth with my 1, 2, and 3 values. NOTE: my actual data is not hardcoded 0, 1, 2, 3. It is calculated data that I need to put the
1st value in the 1st array,
2nd value in the 2nd array,
3rd value in the 3rd processed data array,
and 4th value in the 4th processed data array
that pattern repeats for the entire unprocessed data array. Such that after all processing is done
1st Array holds all the 0 values
2nd Array holds all the 1 values
3rd array holds all the 2 values
4th array holds all the 3 values
I have been looking at _mm512_permutexvar_epi16 to get my unprocessed data into the format.
Below is the code that I have started.
#include <immintrin.h>
#include <array>
#include <iostream>
int main()
{
alignas(64) std::array<uint16_t, 128> unprocessedData;
alignas(64) std::array<uint16_t, 32> processedData0, processedData1, processedData2, processedData3;
alignas(64) constexpr std::array<uint16_t, 32> shuffleMask {
0, 4, 8, 12, 16, 20, 24, 28,
1, 5, 9, 13, 17, 21, 25, 29,
2, 6, 10, 14, 18, 22, 26, 30,
3, 7, 11, 15, 19, 23, 27, 31,
};
//prepare sample data
for (uint16_t i {0}; i < unprocessedData.size(); i+=4)
{
unprocessedData[i] = 0;
unprocessedData[i+1] = 1;
unprocessedData[i+2] = 2;
unprocessedData[i+3] = 3;
}
for (size_t i {0}; i < unprocessedData.size(); i+=32)
{
auto v {_mm512_loadu_epi16(&unprocessedData[i]) };
_mm512_storeu_epi16(&unprocessedData[i],
_mm512_permutexvar_epi16(_mm512_load_si512((__m512i*)shuffleMask.data()), v));
//Somehow Store values 0-7 of permuted array into processedData0
//Store values 8-15 of permuted array into processedData1
//Store values 16-23 of permuted array into processedData2
//Store values 24-31 of permuted array into processedData3
}
return 0;
}