3

Code ran on Visual Studio 2019 Version 16.11.8 with /O2 Optimization and Intel CPU. I am trying to find the root cause for this counter-intuitive result I get that without attributes is statistically faster than with attributes via t-test. I am not sure what is the root cause for this. Could it be some sort of cache? Or some magic the compiler is doing - I cannot really read assembly

     #include <chrono>
     #include <iomanip>
     #include <iostream>
     #include <numeric>
     #include <random>
     #include <vector>
     #include <cmath>
     #include <functional>
    
    static const size_t NUM_EXPERIMENTS = 1000;
    
    double calc_mean(std::vector<double>& vec) {
        double sum = 0;
        for (auto& x : vec)
            sum += x;
        return sum / vec.size();
    }
    
    double calc_deviation(std::vector<double>& vec) {
        double sum = 0;
        for (int i = 0; i < vec.size(); i++)
            sum = sum + (vec[i] - calc_mean(vec)) * (vec[i] - calc_mean(vec));
        return sqrt(sum / (vec.size()));
    }
    
    double calc_ttest(std::vector<double> vec1, std::vector<double> vec2){
        double mean1 = calc_mean(vec1);
        double mean2 = calc_mean(vec2);
        double sd1 = calc_deviation(vec1);
        double sd2 = calc_deviation(vec2);
        double t_test = (mean1 - mean2) / sqrt((sd1 * sd1) / vec1.size() + (sd2 * sd2) / vec2.size());
        return t_test;
    }
    
    namespace with_attributes {
        double calc(double x) noexcept {
            if (x > 2) [[unlikely]]
                return sqrt(x);
            else [[likely]]
                return pow(x, 2);
        }
    }  // namespace with_attributes
    
    
    namespace no_attributes {
        double calc(double x) noexcept {
            if (x > 2)
                return sqrt(x);
            else
                return pow(x, 2);
        }
    }  // namespace with_attributes
    
    std::vector<double> benchmark(std::function<double(double)> calc_func) {
        std::vector<double> vec;
        vec.reserve(NUM_EXPERIMENTS);
    
        std::mt19937 mersenne_engine(12);
        std::uniform_real_distribution<double> dist{ 1, 2.2 };
    
        for (size_t i = 0; i < NUM_EXPERIMENTS; i++) {
    
            const auto start = std::chrono::high_resolution_clock::now();
            for (auto size{ 1ULL }; size != 100000ULL; ++size) {
                double x = dist(mersenne_engine);
                calc_func(x);
            }
            const std::chrono::duration<double> diff =
                std::chrono::high_resolution_clock::now() - start;
            vec.push_back(diff.count());
        }
        return vec;
    }
    
    int main() {
    
        std::vector<double> vec1 = benchmark(with_attributes::calc);
        std::vector<double> vec2 = benchmark(no_attributes::calc);
        std::cout << "with attribute: " << std::fixed << std::setprecision(6) << calc_mean(vec1) << '\n';
        std::cout << "without attribute: " << std::fixed << std::setprecision(6) << calc_mean(vec2) << '\n';
        std::cout << "T statistics" << std::fixed << std::setprecision(6) << calc_ttest(vec1, vec2) << '\n';
    }
user438383
  • 5,716
  • 8
  • 28
  • 43
dizhouw2
  • 53
  • 4
  • what compiler flags did you use and what are the results? – 463035818_is_not_an_ai Jun 02 '22 at 06:58
  • 1
    `calc_ttest` is consume `vec1` and `vec2` by value instead of by reference. That means a copy of those vectors will be passed in - I'm not sure if that's part of your measurements or not. – selbie Jun 02 '22 at 07:23
  • 1
    Because applying such attributes indiscriminately isn't a good/right thing? – ixSci Jun 02 '22 at 07:35
  • From what I can see (using VS 2022, but anyway) the compiler generates exactly the same code for both functions, so its just the order of the tests that matter. Second run is faster than the first. – BoP Jun 02 '22 at 07:37
  • 1
    I also reliably get that result, and I also reliably get the opposite result just by switching the order of the measurements. The reasonable conclusion is that the prediction of that branch is not something that matters. – molbdnilo Jun 02 '22 at 09:56
  • @BoP: Do you mean that VS2022 generates the same code **for this example**, or in general? The attribute should guide the optimizer when it needs to know which case is more likely, but when the optimizer determines that it doesn't care then the attribute doesn't matter and the same code can be generated. – MSalters Jun 02 '22 at 14:19
  • Where would you even apply likely/unlikely attributes? There is no `if` in the code and the `for` loops the compiler already assumes they will likely loop a bunch of times. – Goswin von Brederlow Jun 02 '22 at 14:33
  • @MSalters - The "identical code" is for this particular example. I have seen other cases where an `[[unlikely]]` code section is moved away from the hot path. Perhaps if it has more than one line of code? :-) – BoP Jun 02 '22 at 15:31
  • curious why the second run is faster than first... – dizhouw2 Jun 02 '22 at 23:40

2 Answers2

3

Per godbolt, the two functions generates identical assembly under msvc

        movsd   xmm1, QWORD PTR __real@4000000000000000
        comisd  xmm0, xmm1
        jbe     SHORT $LN2@calc
        xorps   xmm1, xmm1
        ucomisd xmm1, xmm0
        ja      SHORT $LN7@calc
        sqrtsd  xmm0, xmm0
        ret     0
$LN7@calc:
        jmp     sqrt
$LN2@calc:
        jmp     pow

Since msvc is not open source, one could only guess why msvc would choose to ignore this optimization -- maybe because two branches are all function calls (it's tail call so jmp instead of call) and that's too costly for [[likely]] to make a difference. But if clang is used, it's smart enough to optimize power 2 into x * x, so different code would be generated. Following that lead, if your code is modified into

        double calc(double x) noexcept {
            if (x > 2)
                return x + 1;
            else
                return x - 2;
        }

msvc would also output different layout.

Austaras
  • 901
  • 8
  • 24
1

Compilers are smart. These days, they are very smart. They do a lot of work to figure out when they need to do things.

The likely and unlikely attributes exist to solve extremely specific problems. Problems that only become apparent after deep analysis of the performance characteristics, and generated assembly, of a particular piece of performance-critical code. They are not a salve you rub into any old code to make it go faster.

They are a scalpel. And without surgical training, a scalpel is likely to be misused.

So unless you have specific knowledge of a performance problem which analysis of assembly shows can be solved by better branch prediction, you should not assume that any use of these attributes will make any particular code go faster.

That is, the result you're getting is entirely legitimate.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982