Let's say I want to find one byte (think memchr
) quickly. If I want that, I may:
- unroll search loop: compare multiple subsequent array elements, logically AND results, etc
- batch-compare bytes using XORing
union{uint64_t,char[8]}*
and reference consisting from the search byte repeated 8 times.
The second optimization is not effective unless I have a CPU instruction for logically multiplying all bytes (treating each byte of wide value as binary value) of a value.
Do common architectures (x86, ARM, MIPS, SPARC whatever) have extensions for this?
This question is not C-specific.