Consider the following code:
#include <algorithm>
#include <chrono>
#include <iostream>
#include <numeric>
#include <vector>
int main() {
std::vector<int> v(12);
std::iota(v.begin(), v.end(), 0);
//std::next_permutation(v.begin(), v.end());
using clock = std::chrono::high_resolution_clock;
clock c;
auto start = c.now();
unsigned long counter = 0;
do {
++counter;
} while (std::next_permutation(v.begin(), v.end()));
auto end = c.now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::cout << counter << " permutations took " << duration.count() / 1000.0f << " s";
}
Compiled with GCC (MinGW) 5.3 -O2
on my AMD 4.1 GHz CPU this takes 2.3 s
. However if I comment in the uncommented line it slows down to 3.4 s
. I would expect a minimal speed-up because we measure the time for one permutation less. With -O3
the difference is less extreme 2.0 s
to 2.4 s
.
Can anyone explain that? Could a super-smart compiler detect that I want to traverse all permutations and optmize this code?