0

I'm currently translating my chess engine from java to c++. When I completed some parts of it, I noticed that calling those functions took much more time than they 'should'. When I put the function to call in the same file the main function is in, it's much faster, tho (244.569ms vs 0.0002ms). Why's that?

Here's the code:

main.cpp

#include <iostream>
#include <chrono>

using namespace std;

#include "utilities.h"

int main() {
    using std::chrono::steady_clock;
    using std::chrono::duration_cast;
    using std::chrono::duration;
    using std::chrono::milliseconds;
    
        auto t1 = steady_clock::now();
        int end = 0;
        for (int i = 0; i < 100000000; i++) {
            unsigned long b = rotate180(0xFFFF000000000000);
        }

    auto t2 = steady_clock::now();
    auto ms_int = duration_cast<milliseconds>(t2 - t1);
    duration<double, std::milli> ms_double = t2 - t1;

    std::cout << ms_int.count() << "ms\n";
    std::cout << ms_double.count() << "ms\n";
}

utilities.cpp

#include <iostream>
using namespace std;

#include "utilities.h"

// from https://www.chessprogramming.org/Flipping_Mirroring_and_Rotating
unsigned long rotate180(unsigned long x) {
   const unsigned long h1 = 0x5555555555555555;
   const unsigned long h2 = 0x3333333333333333;
   const unsigned long h4 = 0x0F0F0F0F0F0F0F0F;
   const unsigned long v1 = 0x00FF00FF00FF00FF;
   const unsigned long v2 = 0x0000FFFF0000FFFF;
   x = ((x >>  1) & h1) | ((x & h1) <<  1);
   x = ((x >>  2) & h2) | ((x & h2) <<  2);
   x = ((x >>  4) & h4) | ((x & h4) <<  4);
   x = ((x >>  8) & v1) | ((x & v1) <<  8);
   x = ((x >> 16) & v2) | ((x & v2) << 16);
   x = ( x >> 32)       | ( x       << 32);
   return x;
}

utlities.h

#ifndef UTILITIES_H
#define UTILITIES_H

unsigned long rotate180(unsigned long x);

#endif

I'm aware that this example doesn't do much, but it's already appearing here, so I'll have to deal with performance loss, when I'll do some complex calculations.

  • 5
    Most likely when you put everything in one file, compiler notices that `rotate180` function has no observable effects and the return value is unused, so it doesn't run the loop at all. But you should check with profiler or assembly to confirm. – Yksisarvinen Aug 17 '22 at 17:40
  • 6
    Read about [the "as-if rule"](https://stackoverflow.com/questions/15718262/what-exactly-is-the-as-if-rule). Your for-loop _effectively_ does nothing observable, so your C++ compiler is allowed to replace it with _nothing_. – Drew Dormann Aug 17 '22 at 17:44
  • 2
    Note: [`unsigned long` may not be able to store 64 bits.](https://en.cppreference.com/w/c/language/arithmetic_types) You should be using `uint64_t` here (`#include `) In fact `sizeof(unsigned long)` yields 4, i.e. 32 bit for my x64 build using MSVC – fabian Aug 17 '22 at 17:53
  • Ok, I obviously didn't know that, but makes perfect sense. Thank you – John The Fisherman Aug 17 '22 at 17:54
  • @fabian -- or, in this case, perhaps, `uint_fast64_t`. – Pete Becker Aug 17 '22 at 18:59
  • If you put trivial functions into separate compilation units you should look into link time optimization (-flto in gcc/clang) to do whole program optimizations. Otherwise you get no optimization across compilation units at all. – Goswin von Brederlow Aug 18 '22 at 15:26

1 Answers1

2

It's simply the optimizer removing "dead code", code that has no effect or is unused. The performance that you are seeing is real (approx) in one case and then in the other there is no performance data to measure. If you want to profile more accurately there are different and better ways of measuring a hotspot.

J. Tully
  • 129
  • 1
  • 9