2

Algorithm desire

I'm trying to write an algorithm that finds all k length combinations of items in an array, but that must also use n items before index j, for as many of these (n,j) pairs we desire.

Examples

  1. Find all two item combinations from {1,2,3,4} using at least one item before index two (aka restrictions = {{1,2}}). This should result in {{1,2},{1,3},{1,4},{2,3},{2,4}}.

  2. Find all three item combinations from {1,2,3,4,5,6,7} using at least one item before index 2 and two from before index 4, aka restrictions = {{1,2},{2,4}}.

Progress

I've been able to find all combinations of choosing one item each from a bunch of sets.

void add_combinations_to(vector<int> prefix, vector<vector<int>>::iterator start, 
                         vector<vector<int>>::iterator stop, vector<vector<int>>& ret) {
    if(start == stop) {ret.push_back(prefix); return;}
    for(auto v : *start) {
        auto next = prefix;
        next.push_back(v);
        add_combinations_to(next, start + 1, stop, ret);
    }
};

int main() {
    vector<vector<int>> v{{1,2},{3,4}};
    vector<vector<int>> ret;
    vector<int> init{};
    add_combinations_to(init, v.begin(), v.end(), ret);
    // ret now contains {{1,3},{1,4},{2,3},{2,4}}
}

I feel there is a way I can extend this. For now I need to take a step back, but advice would be great. For the most part, this is just an

Peter Moran
  • 295
  • 3
  • 13
  • It seems that the restrictions can be simplized. For your example {1,2,3,4,5,6,7}, – Jun Ge Apr 18 '17 at 08:03
  • I would use `std::next_permutation`. – Jarod42 Apr 18 '17 at 08:11
  • It seems that the restrictions can be simplified. For your example data set ={1,2,3,4,5,6,7}, restrictions = {{1,2},{2,4}}. The restrictions can be {{1,2},{(2-1),4}}, because {1,2} assumes at least one item in first 2 is used, so for {2,4} just need to use one item. – Jun Ge Apr 18 '17 at 08:13
  • 1
    The solution for example 2 is {{1,2,3}, {1,2,4},{1,3,4}, {2,3,4}} ? – maraca Apr 18 '17 at 12:40

1 Answers1

2

As with many sequence enumeration problems, this one yields to the standard lexicographic enumeration algorithm (quoted from Algorithm for "consolidating" N items into K):

  • Start with the lexicographically smallest possible sequence
  • While possible:
    • a. Scan backwards to find the last element which could be increased. ("Could be" means that increasing that element would still result in the prefix of some valid sequence.)
    • b. Increment that element to the next largest possible value
    • c. Fill in the rest of the sequence with the smallest possible suffix.

If there are no other restrictions, the length-j sequence p == p0, p1, …, pj−1 could be a prefix of a length-k combination of n things iff k−j < n−pj−1. We can rewrite that as pj−1 < n − k &plus; j).

So the following simple function would accomplish steps a and b of the above generic algorithm for an unrestricted k, n combination:

/* Finds the (lexicographically) next prefix of a k-combination from n
 * values, and returns the length of that prefix.
 * If there is no possible next prefix, returns 0.
 * Note: k is implicit in the length of the combination
 */
size_t nextPrefix(std::vector<size_t>& comb, size_t n) {
  size_t k = comb.size();
  for (size_t j = k; j;) {
    --j;
    if (comb[j] < n - k + j) {
      ++comb[j];
      return j + 1;
    }
  }
  return 0;
}

To move on to the actual problem, we note that any restriction of the form "the combination includes ki of the first ni values" will end up being exactly the same test, since that restriction could be rephrased as "the ki-length prefix is a combination of ni values."

So we could replace the n parameter in the above function with a vector of pairs (where the last pair is precisely <k, n>):

size_t nextPrefix(std::vector<size_t>& comb,
                   std::vector<std::pair<size_t, size_t>> const& restrictions) {
  for (size_t j = comb.size(); j;) {
    --j;
    /* Test all applicable conditions */
    if (std::all_of(restrictions.begin(), restrictions.end(), 
                    [&j, &comb](std::pair<size_t, size_t> const& rest) {
                      return j >= rest.first
                             or comb[j] < rest.second - rest.first + j;})) {
        ++comb[j];
        return j + 1;
    } 
  }
  return 0;
}

(This is not optimal, obviously. Instead of checking all the restrictions on every element, we could just compute a simple vector of maximum values, and then test the element against that maximum.)

To actually generate the sequences, we need to be able to fill in the suffix (the last step of the generic algorithm) and construct the loop:

void firstSuffix(std::vector<size_t>& comb, size_t pfx_length) {
  size_t val = pfx_length ? comb[pfx_length - 1] + 1 : 0;
  for (auto it = comb.begin() + pfx_length; it != comb.end(); ++it)
    *it = val++;
}

And then we can write the loop:

int main() {
  std::vector<std::pair<size_t, size_t>> restrictions = {{1, 2}, {2, 4}, {3, 7}};
  size_t k = std::max_element(restrictions.begin(), restrictions.end())->first;
  if (k == 0) return 0; /* Empty set */
  std::vector<size_t> comb(k);
  std::size_t pfx = 0;
  do {
    firstSuffix(comb, pfx);
    for (auto const& val : comb) std::cout << std::setw(3) << val;
    std::cout << '\n';
  } while (pfx = nextPrefix(comb, restrictions));
  return 0;
}

(live on coliru)

Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341
  • Amazing! Thanks for the in depth answer and all of the explanation. It helps to know the background topics that lead to a good solution. And of course, example code is great. – Peter Moran Apr 20 '17 at 03:13