I am trying to perform the following operation in AVX2 code (dest, data, and mask are int32 pointers):
int j=0;
for(i=0; i<N; ++i){
if (mask[i]){
dest[j++] = data[i];
}
}
In AVX2, the closest instruction I can find is:
_mm256_maskstore_epi32(dest, (__m256*)mask, (__m256*)data);
But then I would need to perform the packing manually, which seems inefficient...