Function slows down when I put it in header

Question

I have a project where it is important to do speedy conversions from bytes (char) to hex-formatted strings ("00" - "ff")

The problem I have is that my conversion function slows down when I move it from my test file to my conversion library.

the function uses a std::vector<int> as a lookup table, for the precomputed strings.

The speed difference when testing is 4us in the test file, to 8us when called from the library. This is using 1000 iterations of the conversion function.

Can anyone help me understand what is going on? To my eyes, the same code is taking twice the time to execute.

test code with catch2 (partial)

  BENCHMARK("fast, local")
  {
     auto l = [](){
         string x;
         for (int i = 0; i < 1000; ++i) {
           // this is exactly how conv::char2hex works as well
           x += lookupvector[conv::byte2int(random_bytes[i])]; 
         }
         return x;
     };
     return l();
 };
  BENCHMARK("slow, lib")
        {
            auto l = [](){
                string x;
                for (int i = 0; i < 1000; ++i) {
                  x += conv::char2hex(random_bytes[i]);
                }
                return x;
            };
            return l();
        };

function code in conversion.h

    inline string char2hex(const char &x){
      return lookupvector[byte2int(x)];
    };

Compiled with cmake, using clang, release mode (-O2)

Update:

random_bytes is a pre-allocated std::vector<char> with 1M entries for testing.

The BENCHMARK macro runs the test repeatedly for better statistics.

10x'ing the number in the loop does not change the timing difference significantly.

x.reserve(2000); does not change anything, I believe it is already optimized for.

Changing the order of the tests does not change anything.

-flto does not improve the situation

Having the conversion function and lookup table in a local header, compared to a lib does not improve the speed.

With all due respect, 1000 iterations is insufficient to conclude anything. — AlexG, Dec 02 '20 at 20:22
Don't assign your lambda, call it directely `return [](){whatevar();}();` — Surt, Dec 02 '20 at 20:45
You should call `reserve` on `x` to avoid reallocations. Right now you're probably measuring memory (re)allocation speed rather than converting bytes to strings. — IlCapitano, Dec 02 '20 at 20:49
When you include a header the header is effectively pasted into the including file. There should be no difference between code in a header and code directly in the file being compiled. If you reverse the order of the tests, is the slow test still the first test? — user4581301, Dec 02 '20 at 21:12
You probably have bounds-checking enabled in the library. See [here](https://stackoverflow.com/questions/1290396/how-to-make-stdvectors-operator-compile-doing-bounds-checking-in-debug-but). — l33t, Dec 04 '20 at 08:27

Function slows down when I put it in header

0 Answers0