I have a __m256i
vector:
static char __attribute__((aligned(32))) str[32] = "Hello@@, This is my text !!!";
__m256i vec_str=_mm256_load_si256((const __m256i*) str);
Now, based on a 32-bit integer bit-maks, I want to remove some characters from this str (vec_str
) and move the later characters backward to fill the gap. For example, i want to remove @@,
and !!!
and this is my bit-mask (bit 1 => keep | bit 0 => remove)
int mask=0b11110001111111111111111100011111;
In fact, I want to delete the bytes where their bits (in bit-mask) are unset (0), copying later bytes to close the gap. What I expected:
vec_str="Hello This is my text";
I thought about indirect ways like separating the bit-mask into 2 16-bit masks and creating a table with 65536x16 elements for vpshufb (_mm256_shuffle_epi8)
then using the table 2 times with memory-copy .... but I think this way is too heavy (too long). Is there any better direct way or a way without a copy in memory?