C++ Calling function from another file slower than calling it from same file

Question

I'm currently translating my chess engine from java to c++. When I completed some parts of it, I noticed that calling those functions took much more time than they 'should'. When I put the function to call in the same file the main function is in, it's much faster, tho (244.569ms vs 0.0002ms). Why's that?

Here's the code:

main.cpp

#include <iostream>
#include <chrono>

using namespace std;

#include "utilities.h"

int main() {
    using std::chrono::steady_clock;
    using std::chrono::duration_cast;
    using std::chrono::duration;
    using std::chrono::milliseconds;
    
        auto t1 = steady_clock::now();
        int end = 0;
        for (int i = 0; i < 100000000; i++) {
            unsigned long b = rotate180(0xFFFF000000000000);
        }

    auto t2 = steady_clock::now();
    auto ms_int = duration_cast<milliseconds>(t2 - t1);
    duration<double, std::milli> ms_double = t2 - t1;

    std::cout << ms_int.count() << "ms\n";
    std::cout << ms_double.count() << "ms\n";
}

utilities.cpp

#include <iostream>
using namespace std;

#include "utilities.h"

// from https://www.chessprogramming.org/Flipping_Mirroring_and_Rotating
unsigned long rotate180(unsigned long x) {
   const unsigned long h1 = 0x5555555555555555;
   const unsigned long h2 = 0x3333333333333333;
   const unsigned long h4 = 0x0F0F0F0F0F0F0F0F;
   const unsigned long v1 = 0x00FF00FF00FF00FF;
   const unsigned long v2 = 0x0000FFFF0000FFFF;
   x = ((x >>  1) & h1) | ((x & h1) <<  1);
   x = ((x >>  2) & h2) | ((x & h2) <<  2);
   x = ((x >>  4) & h4) | ((x & h4) <<  4);
   x = ((x >>  8) & v1) | ((x & v1) <<  8);
   x = ((x >> 16) & v2) | ((x & v2) << 16);
   x = ( x >> 32)       | ( x       << 32);
   return x;
}

utlities.h

#ifndef UTILITIES_H
#define UTILITIES_H

unsigned long rotate180(unsigned long x);

#endif

I'm aware that this example doesn't do much, but it's already appearing here, so I'll have to deal with performance loss, when I'll do some complex calculations.

Most likely when you put everything in one file, compiler notices that `rotate180` function has no observable effects and the return value is unused, so it doesn't run the loop at all. But you should check with profiler or assembly to confirm. — Yksisarvinen, Aug 17 '22 at 17:40
Read about [the "as-if rule"](https://stackoverflow.com/questions/15718262/what-exactly-is-the-as-if-rule). Your for-loop _effectively_ does nothing observable, so your C++ compiler is allowed to replace it with _nothing_. — Drew Dormann, Aug 17 '22 at 17:44
Note: [`unsigned long` may not be able to store 64 bits.](https://en.cppreference.com/w/c/language/arithmetic_types) You should be using `uint64_t` here (`#include `) In fact `sizeof(unsigned long)` yields 4, i.e. 32 bit for my x64 build using MSVC — fabian, Aug 17 '22 at 17:53
Ok, I obviously didn't know that, but makes perfect sense. Thank you — John The Fisherman, Aug 17 '22 at 17:54
If you put trivial functions into separate compilation units you should look into link time optimization (-flto in gcc/clang) to do whole program optimizations. Otherwise you get no optimization across compilation units at all. — Goswin von Brederlow, Aug 18 '22 at 15:26

score 2 · Answer 1 · answered Aug 17 '22 at 17:50

2

It's simply the optimizer removing "dead code", code that has no effect or is unused. The performance that you are seeing is real (approx) in one case and then in the other there is no performance data to measure. If you want to profile more accurately there are different and better ways of measuring a hotspot.

answered Aug 17 '22 at 17:50

J. Tully

129
1
9

Ok, thank you. Any suggestions for a more suitable tool/library? – John The Fisherman Aug 17 '22 at 17:56
Well, this may be overkill but it's really nice: https://github.com/wolfpld/tracy – J. Tully Aug 17 '22 at 17:59
this one is far less complicated, https://marketplace.visualstudio.com/items?itemName=ArtemGevorkyan.MicroProfilerx64x86 – J. Tully Aug 17 '22 at 18:17
That one looks great. – John The Fisherman Aug 17 '22 at 18:29

C++ Calling function from another file slower than calling it from same file

1 Answers1