Haskell - Generating Subsets of Length k

Question

I have a Haskell function that computes subsets of a set.

subsets :: [a] -> [[a]]
subsets [] = [[]]
subsets (x:xs) = [zs | ys <- subsets xs, zs <- [ys, (x:ys)]]

Example of use:

*Main> subsets [1,2,3]
[[],[1],[2],[1,2],[3],[1,3],[2,3],[1,2,3]]

If I add the definition:

ksubsets k xs = [ys | ys<-subsets xs, length ys==k]

I can compute the subsets of a set of n elements where each subset has exactly k elements

Example of use:
*Main> ksubsets 3 [1,2,3,4,5]
[[1,2,3],[1,2,4],[1,3,4],[2,3,4],[1,2,5],[1,3,5],[2,3,5],[1,4,5],[2,4,5], 
[3,4,5]]

How could I create a more efficient function that generates the subsets of a set with n elements having exactly k elements but without generating all of the subsets. How do I only find the subsets of k elements.

Possible duplicate of [Function to generate the unique combinations of a list in Haskell](https://stackoverflow.com/questions/52602474/function-to-generate-the-unique-combinations-of-a-list-in-haskell) — dopamane, Oct 06 '18 at 06:03

score 5 · Answer 1 · answered Oct 06 '18 at 02:06

Let's consider some recursive definitions, without going into code quite yet.

If you want to get every subset of size n, then you might pick an element, and append that to a subset of size n-1. If we do that for each element, then we should get all subsets of size n. That's a useful starting point!

Let's put that into code now:

-- A base case:
subsetsOfSize 0 _ = [[]]

-- If there are no elements, there are no subsets:
subsetsOfSize _ [] = []

-- The case discussed above:
subsetsOfSize n (x:xs) =
    [x : subs | subs <- subsetsOfSize (n-1) xs] -- ones starting with x,
    ++ subsetsOfSize n xs                       -- ones starting with other elements.

As for efficiency? That's left to you, since this does look a bit like work you should be doing on your own. What do you think is inefficient about this function? Here are a few ideas:

If the length of a list list is m, and m is less than n, then subsetsOfSize n list = []. Do we check for that already?
It's well-known that ++ is not performant. What could we use instead? Is there a way to 'rephrase' this function?

Haskell - Generating Subsets of Length k

1 Answers1