Yes, there are several things you can improve significantly. However, you should keep in mind that the number of combinations you are calculating is so large, that it has to be slow if it is to enumerate all subsets. On my machine and with my personal patience budget (100,5)
is out of reach.
Given that, here are the things you can improve without completely rewriting your entire algorithm.
First: Cache locality
A vector<vector<T>>
will not be contiguous. The nested vector is rather small, so even with preallocation this will always be bad, and iterating over it will be slow because each new sub-vector (and there are a lot) will likely cause a cache miss.
Hence, use a single vector<T>
. Your k
th subset will then not sit at location k
but at k*r
. But this is a significant speedup on my machine.
Second: Use a cpu-friendly permutation vector
Your idea to use next_permutation
is not bad. But the fact that you use a vector<bool>
makes this extremely slow. Paradoxically, using a vector<size_t>
is much faster, because it is easier to load a size_t
and check it than it is to do the same with a bool
.
So, if you take these together the code looks something like this:
auto func2(std::size_t n, std::size_t r){
std::vector<std::size_t> reas;
reas.reserve((1<<r)*n);
std::vector<std::size_t> v(n);
std::fill(v.end() - r, v.end(), 1);
do {
for (std::size_t i = 0; i < n; ++i) {
if (v[i]) {
reas.push_back(i+1);
}
}
} while (std::next_permutation(v.begin(), v.end()));
return reas;
}
Third: Don't press the entire result into one huge buffer
Use a callback to process each sub-set. Thereby you avoid having to return one huge vector. Instead you call a function for each individual sub-set that you found. If you really really need to have one huge set, this callback can still push the sub-sets into a vector, but it can also operate on them in-place.
std::size_t func3(std::size_t n, std::size_t r,
std::function<void(std::vector<std::size_t> const&)> fun){
std::vector<std::size_t> reas;
reas.reserve(r);
std::vector<std::size_t> v(n);
std::fill(v.end() - r, v.end(), 1);
std::size_t num = 0;
do {
reas.clear(); // does not shrink capacity to 0
for (std::size_t i = 0; i < n; ++i) {
if (v[i]) {
reas.push_back(i+1);
}
}
++num;
fun(reas);
} while (std::next_permutation(v.begin(), v.end()));
return num;
}
This yields a speedup of well over 2x in my experiments. But the speedup goes up the more you crank up n
and r
.
Also: Use compiler optimisation
Use your compiler options to speed up the compilation as much as possible. On my system the jump from -O0 to -O1 is a speedup of well more than 10x. The jump to -O3 from -O1 is much smaller but still there (about x1.1).
Unrelated to performance, but still relevant: Why is "using namespace std;" considered bad practice?