1

I would like to write a function, which takes a list and returns a list of all possible subsets which satisfy given condition. For example for I want to have all 3-size subsets of [1,2,3,4] but without any subset which contains 2 and 3.

$> func [1,2,3,4] 3
[[1,2,4],[1,3,4]]

To generate all subsets with size N I have following function (found here):

kCombinations :: [a] -> Integer -> [[a]]
kCombinations _      0 = [[]]
kCombinations []     _ =  []
kCombinations (x:xs) k = withHead ++ withoutHead
  where withHead    = map (x:) (kCombinations xs (k-1))
        withoutHead = kCombinations xs k

I know that the easiest solution for my problem would be to generate first all combinations and after that to use filter function. But for bigger problems like kCombinations [1..30] 6 it takes ages to finish.

Could you tell me how could I filter out some data during generation of all combinations?

Kris
  • 1,538
  • 2
  • 16
  • 27
  • 2
    Are you asking how to compute O(2^n) values in less than O(2^n) time? – Bakuriu Dec 14 '14 at 22:31
  • 3
    How is your condition given? Is it a condition that allows to rule out some subsets based on just a few elements of it (as in your example)? – Joachim Breitner Dec 14 '14 at 23:07
  • `30 choose 6` is `593775`, so I would expect it to take a little while if not just to print it out (if you're printing the list). In GHCi, it takes my computer about 2 seconds to compute it and much longer to print it (I didn't wait for it to finish). You might be able to improve it some with a difference list since you're appending often, especially since it looks like a lot of it will associate to the left. – David Young Dec 14 '14 at 23:28

2 Answers2

2

This question comes up a lot and the most recent comparison of various methods may be found here: Comparison of techniques for generating combinations

The last method (subsequencesOfSize) mentioned is very performant due to memoization. A timing comparison on my machine with ghci:

length $ kCombinations [1..30] 6        - time: 2.73 secs
length $ subsequencesOfSize 6 [1..30]   - time: 0.40 secs

To solve your original problem (subsets without both 2 and 3), there are basically two ways to compute the answer:

  -- import Data.List

  answer1 = filter (\s -> not (elem 2 s && elem 3 s)) $ subsequencesOfSize 6 [1..30]
  answer2 = map (2:) subs23 ++ map (3:) subs23 ++ subsequencesOfSize 6 nums'
    where nums = [1..30]
          nums' = [1..30] \\ [2,3]
          subs23 = subsequencesOfSize 5 nums'

Timings on my box (ghci again):

length answer1           -- 1.48 secs
length answer2           -- 0.36 secs

answer1 is clearly the naive approach; answer2 applies some basic set theory and easily generalizes to counting subsets not containing any two numbers - you can decide whether it's legal or not.

Community
  • 1
  • 1
ErikR
  • 51,541
  • 9
  • 73
  • 124
2

It's possible to improve the subsequencesOfSize function, that user5402 mentioned. Compare this and this. This is because of (l-n) in the second version, so subsequencesOfSize 3 [1..350] is equal to subsequencesBySize [1..350] !! 347, so a lot of unused lists are created.

When filtering subsequences by a predicate p which has the property that p xs is true implies all p (inits xs) is true, it is possible to integrate the predicate into the generation of subsequences for even greater efficiency. Such is the case for the predicate "not containing 2 and 3".

And here is what you want:

zapWith f    xs     []  = xs
zapWith f (x:xs) (y:ys) = f x y : zapWith f xs ys

filterCombs :: ([a] -> Bool) -> Int -> [a] -> [[a]]
filterCombs p n xs | n > length xs = [] 
filterCombs p n xs = go xs id !! n where
    go    []  ds = [[[]]]
    go (x:xs) ds
        | p (ds' []) = zapWith (++) ([] : map (map (x:)) with) without
        | otherwise  = without
        where
            ds'     = ds . (x:)
            with    = go xs ds'
            without = go xs ds

zapWith is intentionally non-exhaustive. ds in the go function is a difference list, that contains all previous elements. You can read go like this: if a property p holds for <all previous elements of xs> ++ [x] then include combinations with x and without x, otherwise include only without x.

Some examples:

conseq (x:y:xs) = succ x == y && conseq (y:xs)
conseq      xs  = True

main = do
    print $ filterCombs (\xs -> any (`notElem` xs) [2,3]) 3 [1..5]
    print $ filterCombs conseq 4 $ [1..8] ++ [8,7..1]
    print $ filterCombs (all (<= 10)) 9 [1..5000]

prints

[[1,2,4],[1,2,5],[1,3,4],[1,3,5],[1,4,5],[2,4,5],[3,4,5]]
[[1,2,3,4],[1,2,3,4],[2,3,4,5],[2,3,4,5],[3,4,5,6],[3,4,5,6],[4,5,6,7],[4,5,6,7],[5,6,7,8],[5,6,7,8]]
[[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,10],[1,2,3,4,5,6,7,9,10],[1,2,3,4,5,6,8,9,10],[1,2,3,4,5,7,8,9,10],[1,2,3,4,6,7,8,9,10],[1,2,3,5,6,7,8,9,10],[1,2,4,5,6,7,8,9,10],[1,3,4,5,6,7,8,9,10],[2,3,4,5,6,7,8,9,10]]
effectfully
  • 12,325
  • 2
  • 17
  • 40
  • 1
    I added a note about what `filterCombs` expects of the predicate `p` - hope you don't mind. – ErikR Dec 15 '14 at 07:16