6

Disclaimer


I am looking for the fastest way to identify the first occurrence of a given byte in a byte buffer.

This is reminiscent of looking for the first occurrence of a character in a string except that:

  • the byte buffer is not NUL-terminated, instead I have an explicit length (and possibly embedded NUL characters)
  • the byte buffer is not allocated in a string or vector, I am only handed down a slice (aka, pointer & length)

The basic solution is:

size_t search(char const* buffer, size_t length, char c) {
    return std::find(buffer, buffer + length, c) - buffer;
}

However, a quick round-trip with the Godbolt compiler (-O2 -msse2 -mavx) does not show any hint of a vectorized instruction, only some unrolling, so I am wondering whether this is the optimal.

Is there a faster way to find the first occurrence of a given byte in a buffer?

Note: only the first occurrence matters.

Note: I am exclusively concerned with modern x86_64 CPUs on Linux, though I encourage answers to be as generic as possible and mention assumptions clearly.

Community
  • 1
  • 1
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 2
    Maybe try [`memchr`](https://linux.die.net/man/3/memchr) - it's like `strchr` but it doesn't require a NUL-terminated string ? – Paul R Nov 16 '16 at 13:14
  • 1
    It’s dismaying that `std::find` isn’t optimised to take advantage of compiler intrinsics on GCC. Somebody should write a patch, it’s such an obvious optimisation. – Konrad Rudolph Nov 16 '16 at 13:46
  • @KonradRudolph: I am quite surprised as well, especially since according to David Haim the optimization is done on VC++. Maybe a concern about inlining? (As in, a pure C++ implementation can be compile-time evaluated, an assembly one cannot) – Matthieu M. Nov 16 '16 at 13:48
  • @MatthieuM. at least on VC++, all the `strXXX` and `memXXX` functions will be compile-time evaluated on compile time if the data is known on compile time. I don't think it's a technical issue that prevent GCC from using it – David Haim Nov 16 '16 at 13:55
  • @DavidHaim: Ah, probably because resolving to intrinsics, the compiler knows what their functionality is. – Matthieu M. Nov 16 '16 at 13:56

1 Answers1

4

you can use memchr , which is usually implemented as an intrinsic function, and is usually (from my experience) much faster than any hand-rolled loop.

http://en.cppreference.com/w/c/string/byte/memchr

Edit: at least on VC++ (and I bet on GCC as well, I haven't checked), std::find will use memchr anyway if you look for a byte , so I would check if memchr actually make the program run faster.

David Haim
  • 25,446
  • 3
  • 44
  • 78
  • An explanation of `memchr` implementation can be found [here](http://stackoverflow.com/questions/525123/how-does-memchr-work-under-the-hood). Notes from [BurntSushi's Rust memchr crate](https://github.com/BurntSushi/rust-memchr/blob/master/src/lib.rs) hint that libc's `memchr` is slowish on Windows though. – Matthieu M. Nov 16 '16 at 13:23
  • From Godbolt, no, `std::find` on a `char` is not reduced to a `memchr` by gcc (6.2). – Matthieu M. Nov 16 '16 at 13:24
  • @MatthieuM. so here is a something to try out :) – David Haim Nov 16 '16 at 13:25
  • Yes, it's a really good contender certainly; especially since I don't have to care about Windows compatibility in my case. – Matthieu M. Nov 16 '16 at 13:26