Finding all subsets of specified size

Question

I've been scratching my head about this for two days now and I cannot come up with a solution. What I'm looking for is a function f(s, n) such that it returns a set containing all subsets of s where the length of each subset is n.

Demo:

s={a, b, c, d}

f(s, 4)
{{a, b, c, d}}

f(s, 3) 
{{a, b, c}, {a, b, d}, {a, c, d}, {b, c, d}}

f(s, 2)
{{a, b}, {a, c}, {a, d}, {b, c}, {b, d}, {c, d}}

f(s, 1)
{{a}, {b}, {c}, {d}}

I have a feeling that recursion is the way to go here. I've been fiddling with something like

f(S, n):
   for s in S:
       t = f( S-{s}, n-1 ) 
       ...

But this does not seem to do the trick. I did notice that len(f(s,n)) seems to be the binomial coefficient bin(len(s), n). I guess this could be utilized somehow.

Can you help me please?

The search term would be "find subset permutations", something like that. Some languages like C++ have built-in library support. Yes, mathematically this could be a recursive algorithm. How to best implement it practically in a certain language is another story. — Lundin, Feb 01 '22 at 10:18
This seems rather similar question: https://stackoverflow.com/questions/61592209/how-can-i-generate-all-permutations-of-length-n-from-a-set-of-k-elements/61599670#61599670 — Damien, Feb 01 '22 at 10:22
I think they are looking for combinations of length `n` since their results are unordered tuples @Lundin — user1984, Feb 01 '22 at 10:22
@Lundin and Damien. If I'm not wrong, permutations are ordered, which sets are not. — klutt, Feb 01 '22 at 10:34
It's the same algorithm regardless. The advantage of sets is that you have no duplicates, as opposed to for example algorithms for finding all sub strings of a string. Also in practice I don't quite see how you can implement a set in a feasible way without sorting it, or adding/removing will be terribly inefficient. C++ `std::set` for example, comes with a mandatory sorting order and you must define a way to compare items of the set in advance, in order to use the container class. — Lundin, Feb 01 '22 at 11:19
@klutt You are right, sets are not ordered. In practice, this implies that you can basically use the same kind of algorithms, except that they are simpler. — Damien, Feb 01 '22 at 13:35

score 1 · Answer 1 · answered Feb 01 '22 at 10:35

One way to solve this is by backtracking. Here's a possible algorithm in pseudo code:

def backtrack(input_set, idx, partial_res, res, n):
  if len(partial_res == n):
    res.append(partial_res[:])
    return
  
  for i in range(idx, len(input_set)):
    partial_res.append(input_set[i])
    backtrack(input_set, idx+1, partial_res, res, n) # path with input_set[i]
    partial_res.pop()
    backtrack(input_set, idx+1, partial_res, res, n) # path without input_set[i]

Time complexity of this approach is O(2^len(input_set)) since we make 2 branches at each element of input_set, regardless of whether the path leads to a valid result or not. The space complexity is O(len(input_set) choose n) since this is the number of valid subsets you get, as you correctly pointed out in your question.

Now, there is a way to optimize the above algorithm to reduce the time complexity to O(len(input_set) choose n) by pruning the recursive tree to paths that can lead to valid results only.

If n - len(partial_res) < len(input_set) - idx + 1, we are sure that even if we took every remaining element in input_set[idx:] we are still short at least one to reach n. So we can employ this as a base case and return and prune.

Also, if n - len(partial_res) == len(input_set) - idx + 1, this means that we need each and every element in input_set[idx:] to get the required n length result. Thus, we can't skip any elements and so the second branch of our recursive call becomes redundant.

backtrack(input_set, idx+1, partial_res, res, n) # path without input_set[i]

We can skip this branch with a conditional check. Implementing these base cases correctly, reduces the time complexity of the algorithm to O(len(input_set) choose k), which is a hard limit because that's the number of subsets that there are.

score 1 · Accepted Answer · edited Feb 02 '22 at 12:59

Let us call n the size of the array and k the number of elements to be out in a subarray.

Let us consider the first element A[0] of the array A.
If this element is put in the subset, the problem becomes a (n-1, k-1) similar problem.
If not, it becomes a (n-1, k) problem.

This can be simply implemented in a recursive function.

We just have to pay attention to deal with the extreme cases k == 0 or k > n.

During the process, we also have to keep trace of:

n: the number of remaining elements of A to consider
k: the number of elements that remain to be put in the current subset
index: the index of the next element of A to consider
The current_subset array that memorizes the elements already selected.

Here is a simple code in c++ to illustrate the algorithm

Output

For 5 elements and subsets of size 3:

#include    <iostream>
#include    <vector>

void print (const std::vector<std::vector<int>>& subsets) {
    for (auto &v: subsets) {
        for (auto &x: v) {
            std::cout << x << " ";
        }
        std::cout << "\n";
    }
}
//  n: number of remaining elements of A to consider
//  k: number of elements that remain to be put in the current subset
//  index: index of next element of A to consider

void Get_subset_rec (std::vector<std::vector<int>>& subsets, int n, int k, int index, std::vector<int>& A, std::vector<int>& current_subset) {
    if (n < k) return;   
    if (k == 0) {
        subsets.push_back (current_subset);
        return;
    }  
    Get_subset_rec (subsets, n-1, k, index+1, A, current_subset);
    current_subset.push_back(A[index]);
    Get_subset_rec (subsets, n-1, k-1, index+1, A, current_subset);
    current_subset.pop_back();         // remove last element
    return;
}

void Get_subset (std::vector<std::vector<int>>& subsets, int subset_length, std::vector<int>& A) {
    std::vector<int> current_subset;
    Get_subset_rec (subsets, A.size(), subset_length, 0, A, current_subset);
}

int main () {
    int subset_length = 3;     // subset size
    std::vector A = {1, 2, 3, 4, 5};
    int size = A.size();
    std::vector<std::vector<int>> subsets;

    Get_subset (subsets, subset_length, A);
    std::cout << subsets.size() << "\n";
    print (subsets);
}

Live demo

I tried to implement this i Java, but I failed. I would be grateful if you could help me out. https://stackoverflow.com/q/70976317/6699433 — klutt, Feb 03 '22 at 19:04

n. m. could be an AI · Answer 3 · 2022-02-01T15:13:54.167

subseqs 0 _      = [[]]
subseqs k []     = []
subseqs k (x:xs) = map (x:) (subseqs (k-1) xs) ++ subseqs k xs

Live demo

The function looks for subsequences of (non-negative) length k in a given sequence. There are three cases:

If the length is 0: there is a single empty subsequence in any sequence.
Otherwise, if the sequence is empty: there are no subsequences of any (positive) length k.
Otherwise, there is a non-empty sequence that starts with x and continues with xs, and a positive length k. All our subsequences are of two kinds: those that contain x (they are subsequences of xs of length k-1, with x stuck at the front of each one), and those that do not contain x (they are just subsequences of xs of length k).

The algorithm is a more or less literal translation of these notes to Haskell. Notation cheat sheet:

[] an empty list
[w] a list with a single element w
x:xs a list with a head of x and a tail of xs
(x:) a function that sticks an x in front of any list
++ list concatenation
f a b c a function f applied to arguments a b and c

FP always seems like magic to me :D – klutt Feb 01 '22 at 17:43 — klutt, Feb 01 '22 at 17:43

score 0 · Answer 4 · answered Dec 03 '22 at 21:57

Here is a non-recursive python function that takes a list superset and returns a generator that produces all subsets of size k.

def subsets_k(superset, k):
  if k > len(superset):
    return
  if k == 0:
    yield []
    return

  indices = list(range(k))
  while True:
    yield [superset[i] for i in indices]

    i = k - 1
    while indices[i] == len(superset) - k + i:
      i -= 1
      if i == -1:
        return
    
    indices[i] += 1
    for j in range(i + 1, k):
      indices[j] = indices[i] + j - i

Testing it:

for s in subsets_k(['a', 'b', 'c', 'd', 'e'], 3):
  print(s)

Output:

['a', 'b', 'c']
['a', 'b', 'd']
['a', 'b', 'e']
['a', 'c', 'd']
['a', 'c', 'e']
['a', 'd', 'e']
['b', 'c', 'd']
['b', 'c', 'e']
['b', 'd', 'e']
['c', 'd', 'e']

Finding all subsets of specified size

4 Answers4

Linked