I have to translate the following instructions from SSE to Neon
uint32_t a = _mm_cvtsi128_si32(_mm_shuffle_epi8(a,SHUFFLE_MASK) );
Where:
static const __m128i SHUFFLE_MASK = _mm_setr_epi8(3, 7, 11, 15, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1);
So basically I have to take 4th,8th,12th and 16th bytes from the register and put it into an uint32_t
. Looks like a packing instruction (in SSE I seem to remember I used shuffle because it saves one instructions compared to packing, this example shows the use of packing instructions).
How does this operation translate in Neon?
Should I use packing instructions?
How do I then extract 32bits? (Is there anything equivalent to _mm_cvtsi128_si32
?)
Edit:
To start with, vgetq_lane_u32
should allow to replace _mm_cvtsi128_si32
(but I will have to cast my uint8x16_t to uint32x4_t)
uint32_t vgetq_lane_u32(uint32x4_t vec, __constrange(0,3) int lane);
or directly store the lane vst1q_lane_u32
void vst1q_lane_u32(__transfersize(1) uint32_t * ptr, uint32x4_t val, __constrange(0,3) int lane); // VST1.32 {d0[0]}, [r0]