0

Let's say I want to find one byte (think memchr) quickly. If I want that, I may:

  • unroll search loop: compare multiple subsequent array elements, logically AND results, etc
  • batch-compare bytes using XORing union{uint64_t,char[8]}* and reference consisting from the search byte repeated 8 times.

The second optimization is not effective unless I have a CPU instruction for logically multiplying all bytes (treating each byte of wide value as binary value) of a value.

Do common architectures (x86, ARM, MIPS, SPARC whatever) have extensions for this?

This question is not C-specific.

Euri Pinhollow
  • 332
  • 2
  • 17
  • Are you asking about vector instructions? Many platforms have them (e.g. SSEx and AVX on Intel), but they differ quite significantly from platform to platform. – Oliver Charlesworth Apr 17 '17 at 15:20
  • 2
    Measure the speed of memchr () before you try to do anything tricky. – gnasher729 Apr 17 '17 at 15:22
  • @OliverCharlesworth yes. Architecutre-specific instructions are fine. – Euri Pinhollow Apr 17 '17 at 15:30
  • @gnasher729 good point but not exactly the answer I am looking for. https://github.com/lattera/glibc/blob/master/string/memchr.c has interesting implementation though. – Euri Pinhollow Apr 17 '17 at 15:35
  • 1
    [Find the first instance of a character using simd](http://stackoverflow.com/q/40915243/995714), [Compare 16 byte strings with SSE](http://stackoverflow.com/q/31999284/995714), [Implementing strcmp, strlen, and strstr using SSE 4.2 instructions](https://www.strchr.com/strcmp_and_strlen_using_sse_4.2). With AVX512 you can do operations on 64 bytes at once, so it's much faster than your uint64_t trick – phuclv Apr 17 '17 at 15:56
  • @LưuVĩnhPhúc make it an answer. :3 – Euri Pinhollow Apr 17 '17 at 15:58

0 Answers0