Extreme slow-down when starting at second permutation

Question

Consider the following code:

#include <algorithm>
#include <chrono>
#include <iostream>
#include <numeric>
#include <vector>

int main() {
    std::vector<int> v(12);
    std::iota(v.begin(), v.end(), 0);

    //std::next_permutation(v.begin(), v.end());

    using clock = std::chrono::high_resolution_clock;
    clock c;
    auto start = c.now();

    unsigned long counter = 0;
    do {
        ++counter;
    } while (std::next_permutation(v.begin(), v.end()));

    auto end = c.now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);    
    std::cout << counter << " permutations took " << duration.count() / 1000.0f << " s";
}

Compiled with GCC (MinGW) 5.3 -O2 on my AMD 4.1 GHz CPU this takes 2.3 s. However if I comment in the uncommented line it slows down to 3.4 s. I would expect a minimal speed-up because we measure the time for one permutation less. With -O3 the difference is less extreme 2.0 s to 2.4 s.

Can anyone explain that? Could a super-smart compiler detect that I want to traverse all permutations and optmize this code?

Calling `next_permutation` twice caused it to not be inlined — harold, Jun 25 '17 at 18:13
Fwiw, Compiler Explorer clearly shows the lack of inlining in -O2 - see [commented code](https://godbolt.org/g/RJQ3cJ) and [uncommented code](https://godbolt.org/g/6ZVA31). The weirder case is actually with -O3, in which all calls are inlined but the assembly is arranged slightly differently. Why the reorder produces worse performance (according to the OP's report) is unclear. — Eran, Jun 25 '17 at 19:26
Thanks, i find it educational. BTW, i see no difference whatsoever with -O3. Actually the 'slow' version runs a fraction of a second faster than the fast one :-). — Serge, Jun 25 '17 at 19:59

gsamaras · Accepted Answer · 2017-06-25T18:29:43.907

2

I think the compiler gets confused by you calling the function in two separate lines in your code, causing it not be inline.

GCC 8.0.0 also behaves as yours.

Benefits of inline functions in C++? It provides a simple mechanism for the compiler to apply more optimizations, so losing the inline identification may cause a severe drop of performance, in some cases.

edited Jun 25 '17 at 18:29

answered Jun 25 '17 at 18:24

gsamaras

71,951
46
188
305

That seems to be indeed a part of the problem: Putting the line into a non-optimizable but always false `if` statement leads to the same drop although the function is never called. Got spot. _But_ the code runs still `0.1 s` slower if the function is executed. – Albjenow Jun 25 '17 at 18:32

Extreme slow-down when starting at second permutation

1 Answers1