Optimize C++ function to generate combinations

Question

I'm trying to get a function to generate all possible combinations of numbers but my problem is the too long elaboration time. So I think I've to optimize it. Problem: Generate all set of "r" size with 1 to n elements without repeat it in reverse order (1,2 is equal to 2, 1).

Example:

n = 3 //elements: 1,2,3
r = 2 //size of set

Output:
2 3
1 3
1 2

The code I'm using is the following:

    void func(int n, int r){
        vector <vector <int>> reas;
        vector<bool> v(n);
        fill(v.end() - r, v.end(), true);

        int a = 0;
        do {
            reas.emplace_back();
            for (int i = 0; i < n; ++i) {
                if (v[i]) {
                    reas[a].push_back(i+1);
                }
            }
            a++;
        } while (next_permutation(v.begin(), v.end()));
    }

If n = 3 and r = 2 the output will be the same of the example upside. My problem is that if I put n = 50 and r = 5 the elaboration time is too high and I need to work with a range of n = 50...100 and r= 1..5; Is there a way to optimize this function?

Thank's a lot

score 1 · Accepted Answer · answered Apr 05 '20 at 09:39

Yes, there are several things you can improve significantly. However, you should keep in mind that the number of combinations you are calculating is so large, that it has to be slow if it is to enumerate all subsets. On my machine and with my personal patience budget (100,5) is out of reach.

Given that, here are the things you can improve without completely rewriting your entire algorithm.

First: Cache locality

A vector<vector<T>> will not be contiguous. The nested vector is rather small, so even with preallocation this will always be bad, and iterating over it will be slow because each new sub-vector (and there are a lot) will likely cause a cache miss.

Hence, use a single vector<T>. Your kth subset will then not sit at location k but at k*r. But this is a significant speedup on my machine.

Second: Use a cpu-friendly permutation vector

Your idea to use next_permutation is not bad. But the fact that you use a vector<bool> makes this extremely slow. Paradoxically, using a vector<size_t> is much faster, because it is easier to load a size_t and check it than it is to do the same with a bool.

So, if you take these together the code looks something like this:

  auto func2(std::size_t n, std::size_t r){
    std::vector<std::size_t> reas;
    reas.reserve((1<<r)*n);
    std::vector<std::size_t> v(n);
    std::fill(v.end() - r, v.end(), 1); 

    do {
      for (std::size_t i = 0; i < n; ++i) {
        if (v[i]) {
          reas.push_back(i+1);
        }
      }   
    } while (std::next_permutation(v.begin(), v.end()));
    return reas;
  }

Third: Don't press the entire result into one huge buffer

Use a callback to process each sub-set. Thereby you avoid having to return one huge vector. Instead you call a function for each individual sub-set that you found. If you really really need to have one huge set, this callback can still push the sub-sets into a vector, but it can also operate on them in-place.

  std::size_t func3(std::size_t n, std::size_t r, 
                    std::function<void(std::vector<std::size_t> const&)> fun){
    std::vector<std::size_t> reas;
    reas.reserve(r);
    std::vector<std::size_t> v(n);
    std::fill(v.end() - r, v.end(), 1);

    std::size_t num = 0;
    do {
      reas.clear(); // does not shrink capacity to 0
      for (std::size_t i = 0; i < n; ++i) {
        if (v[i]) {
          reas.push_back(i+1);
        }
      }
      ++num;
      fun(reas);
    } while (std::next_permutation(v.begin(), v.end()));
    return num;
  }

This yields a speedup of well over 2x in my experiments. But the speedup goes up the more you crank up n and r.

Also: Use compiler optimisation

Use your compiler options to speed up the compilation as much as possible. On my system the jump from -O0 to -O1 is a speedup of well more than 10x. The jump to -O3 from -O1 is much smaller but still there (about x1.1).

Unrelated to performance, but still relevant: Why is "using namespace std;" considered bad practice?

Optimize C++ function to generate combinations

1 Answers1

First: Cache locality

Second: Use a cpu-friendly permutation vector

Third: Don't press the entire result into one huge buffer

Also: Use compiler optimisation