8

Consider the following code:

#include <algorithm>
#include <chrono>
#include <iostream>
#include <numeric>
#include <vector>

int main() {
    std::vector<int> v(12);
    std::iota(v.begin(), v.end(), 0);

    //std::next_permutation(v.begin(), v.end());

    using clock = std::chrono::high_resolution_clock;
    clock c;
    auto start = c.now();

    unsigned long counter = 0;
    do {
        ++counter;
    } while (std::next_permutation(v.begin(), v.end()));

    auto end = c.now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);    
    std::cout << counter << " permutations took " << duration.count() / 1000.0f << " s";
}

Compiled with GCC (MinGW) 5.3 -O2 on my AMD 4.1 GHz CPU this takes 2.3 s. However if I comment in the uncommented line it slows down to 3.4 s. I would expect a minimal speed-up because we measure the time for one permutation less. With -O3 the difference is less extreme 2.0 s to 2.4 s.

Can anyone explain that? Could a super-smart compiler detect that I want to traverse all permutations and optmize this code?

gsamaras
  • 71,951
  • 46
  • 188
  • 305
Albjenow
  • 369
  • 2
  • 12
  • 1
    Calling `next_permutation` twice caused it to not be inlined – harold Jun 25 '17 at 18:13
  • Do you see a similar behavior with -O0? – Serge Jun 25 '17 at 18:59
  • Fwiw, Compiler Explorer clearly shows the lack of inlining in -O2 - see [commented code](https://godbolt.org/g/RJQ3cJ) and [uncommented code](https://godbolt.org/g/6ZVA31). The weirder case is actually with -O3, in which all calls are inlined but the assembly is arranged slightly differently. Why the reorder produces worse performance (according to the OP's report) is unclear. – Eran Jun 25 '17 at 19:26
  • @Serge no with -O0 both run equally slow – Albjenow Jun 25 '17 at 19:39
  • Thanks, i find it educational. BTW, i see no difference whatsoever with -O3. Actually the 'slow' version runs a fraction of a second faster than the fast one :-). – Serge Jun 25 '17 at 19:59

1 Answers1

2

I think the compiler gets confused by you calling the function in two separate lines in your code, causing it not be inline.

GCC 8.0.0 also behaves as yours.

Benefits of inline functions in C++? It provides a simple mechanism for the compiler to apply more optimizations, so losing the inline identification may cause a severe drop of performance, in some cases.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • That seems to be indeed a part of the problem: Putting the line into a non-optimizable but always false `if` statement leads to the same drop although the function is never called. Got spot. _But_ the code runs still `0.1 s` slower if the function is executed. – Albjenow Jun 25 '17 at 18:32