0

I have a m128i vector with chars. using _mm_cmpeq_epi8() I found a position of an interesting my char.

my question is, is there any efficient way how can I test the next char after the found position using the same vector?

best!

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
niXman
  • 71
  • 7
  • 2
    I assume you're using `_mm_movemask_epi8` + `tzcnt` or `__builtin_ctz` (BSF) to find the first match position? If so, clear the lowest set bit and `tzcnt` again. `msk &= msk-1`, which can compile to [`blsr`](https://www.felixcloutier.com/x86/blsr) if you let the compiler use BMI1 instructions, otherwise LEA+AND. Your `_mm_cmpeq_epi8` already tested all 16 chars, the part you need to repeat is what you do with the mask which you didn't mention in the question. If I'm guessing wrong about what you mean, [edit] to clarify. – Peter Cordes Oct 29 '22 at 06:28
  • Semi-related: if you have AVX-512BW, you can get an array of match positions with `vpcompressb`: see [AVX Search Array UB with zero input](https://stackoverflow.com/a/74230689) – Peter Cordes Oct 29 '22 at 06:32
  • 1
    Or do you mean compare against something else, to check for a 2-byte match after finding a candidate starting position? After getting the position of the match as described in my last comment, `(char*)p + offset_within_vec + 1`. Access it in memory (where it will hit in L1d cache because you already just loaded from there), not by indexing the vector that was an input to `pcmpeqb`; the load hardware is great at variable indexing, the SIMD hardware would need multiple extra instructions including a shuffle like SSSE3 `pshufb`. – Peter Cordes Oct 29 '22 at 07:32
  • 1
    IDK if there's a clever bithack to combine two `pcmpeqb/pmovmskb` bitmaps to find positions where the first has a bit set followed by the 2nd having a bit set... Oh, maybe `mask2 & (mask1 << 1)`. Or vector shift/AND before movemask. Or `_mm_cmpeq_epi16` at two offsets. But IDK which if any of these ideas you might be looking for. – Peter Cordes Oct 29 '22 at 07:38
  • @PeterCordes _Or do you mean compare against something else, to check for a 2-byte match after finding a candidate starting position?_ - correct!, _Access it in memory (where it will hit in L1d cache because you already just loaded from there), not by indexing the vector that was an input to pcmpeqb;_ thanks! didn't know that – niXman Oct 29 '22 at 09:48

0 Answers0