I need to shift the top bit from each element of b
into the bottom of corresponding elements of a
, like AVX512VBMI2 _mm256_shldi_epi16/32/64
with a count of 1
.
Does someone know a way to shift this way?
Example:
__m256i x = { 11001100, 00110011, 11001100, 00110011,... x16 }
__m256i y = { 10111100, 10001011, 11000010, 01100111,... x16 }
__m256i res = _mm256_shldi_epi16(x,y);
Then res contains:
10011001
, 01100111
, 10011001
, 01100110
, ...x16
(editor's note: the question previously described this as _mm256_sllv_epi8
. sllv
is a variable-count shift where the count for each element comes from the corresponding element in the other source, and is nothing like a double-shift.)