I want to compare two vectors of 16 bytes and get every matching index. A small example to illustrate what I want:
fn get_matching_idx(arr1: &[u8], arr2: &[u8]) {
let vec1 = u8x16::load_aligned(arr1);
let vec2 = u8x16::load_aligned(arr2);
let matches = vec1.eq(vec2);
for i in 0..16 {
if matches.extract_unchecked(i) {
// Do something with the index
}
}
}
Ideally, I'd just want to "Do something" for the set indices, rather than checking every single one (there will be a low number of matches).
Is there a way to get the matching indices using intrinsics, rather than iterating through the whole vector? With gcc for example, I could use _mm_movemask_epi8 to bit pack the vector and then repeated applications of __builtin_clz
to get the index of the first set bit (which is more performant for sparse numbers which I would have). Alternatively, I could have a lookup table which did the right thing for each nibble in my bit-packed integer (e.g. the first answer here).
Is there an equivalent of these instructions in rust?
I'm compiling for an Intel x86-64 processor and cross platform support is not a requirement.
NOTE: I'd prefer a solution in native (safe) rust, but this is not a hard requirement. I am fine writing unsafe rust, or even using some sort of FFI to link to the aforementioned methods.