I have a project where it is important to do speedy conversions from bytes (char
) to hex-formatted strings ("00" - "ff")
The problem I have is that my conversion function slows down when I move it from my test file to my conversion library.
the function uses a std::vector<int>
as a lookup table, for the precomputed strings.
The speed difference when testing is 4us in the test file, to 8us when called from the library. This is using 1000 iterations of the conversion function.
Can anyone help me understand what is going on? To my eyes, the same code is taking twice the time to execute.
test code with catch2 (partial)
BENCHMARK("fast, local")
{
auto l = [](){
string x;
for (int i = 0; i < 1000; ++i) {
// this is exactly how conv::char2hex works as well
x += lookupvector[conv::byte2int(random_bytes[i])];
}
return x;
};
return l();
};
BENCHMARK("slow, lib")
{
auto l = [](){
string x;
for (int i = 0; i < 1000; ++i) {
x += conv::char2hex(random_bytes[i]);
}
return x;
};
return l();
};
function code in conversion.h
inline string char2hex(const char &x){
return lookupvector[byte2int(x)];
};
Compiled with cmake, using clang, release mode (-O2)
Update:
random_bytes
is a pre-allocated std::vector<char>
with 1M entries for testing.
The BENCHMARK
macro runs the test repeatedly for better statistics.
10x'ing the number in the loop does not change the timing difference significantly.
x.reserve(2000);
does not change anything, I believe it is already optimized for.
Changing the order of the tests does not change anything.
-flto
does not improve the situation
Having the conversion function and lookup table in a local header, compared to a lib does not improve the speed.