I really like std::byte
as a distinct type that implements the concept of byte as specified in the C++ language definition. What I don't like is the fact that modern C++ compilers will produce less optimized code using the standard algorithms.
Here I'm playing with a function that checks the first 4 bytes in the header, you can follow my snippets on Godbolt
bool func_bytes(const std::array<std::byte, 1024>& buf) {
constexpr std::array<std::byte, 4> header {
std::byte{0xDE}, std::byte{0xAD}, std::byte{0xBE}, std::byte{0xAF}
};
return std::equal(header.begin(), header.end(), buf.begin());
}
This will produce the following assembly on x86-64 gcc trunk
func_bytes(std::array<std::byte, 1024ul> const&):
cmp BYTE PTR [rdi], -34
jne .L5
cmp BYTE PTR [rdi+1], -83
jne .L5
cmp BYTE PTR [rdi+2], -66
jne .L5
cmp BYTE PTR [rdi+3], -81
sete al
ret
.L5:
xor eax, eax
ret
If I replace the std::byte
with unsigned char
, then compiler will optimize to just an dword
comparison.
bool func_chars(const std::array<unsigned char, 1024>& buf) {
constexpr std::array<unsigned char, 4> header {0xDE, 0xAD, 0xBE, 0xAF};
return std::equal(header.begin(), header.end(), buf.begin());
}
Here is the assembly produced
.LC0:
.string "\336\255\276\257"
func_chars(std::array<unsigned char, 1024ul> const&):
mov eax, DWORD PTR [rdi]
cmp DWORD PTR .LC0[rip], eax
sete al
ret
My solution to optimized std::byte
version is to use old friends memcmp()
and memcpy()
, which is translated into the compiler's builtin.
bool func_bytes_memcmp(const std::array<std::byte, 1024>& buf) {
constexpr std::array<std::byte, 4> header {
std::byte{0xDE}, std::byte{0xAD}, std::byte{0xBE}, std::byte{0xAF}
};
return 0==std::memcmp(header.data(), buf.data(), header.size());
}
which produces the smallest code!
func_bytes_memcmp(std::array<std::byte, 1024ul> const&):
cmp DWORD PTR [rdi], -1346458146
sete al
ret
Is this the modern C++ approach?