6

Is there a Haskell function that generates all the unique combinations of a given length from a list?

Source = [1,2,3]

uniqueCombos 2 Source = [[1,2],[1,3],[2,3]]

I tried looking in Hoogle but could not find a function that did this specifically. Permutations does not give the desired result.

Has anybody used a similar function before?

jpmarinier
  • 4,427
  • 1
  • 10
  • 23
newb346
  • 165
  • 1
  • 1
  • 6

7 Answers7

14

I don't know a predefined function either, but it's pretty easy to write yourself:

-- Every set contains a unique empty subset.
subsets 0 _ = [[]]

-- Empty sets don't have any (non-empty) subsets.
subsets _ [] = []

-- Otherwise we're dealing with non-empty subsets of a non-empty set.
-- If the first element of the set is x, we can get subsets of size n by either:
--   - getting subsets of size n-1 of the remaining set xs and adding x to each of them
--     (those are all subsets containing x), or
--   - getting subsets of size n of the remaining set xs
--     (those are all subsets not containing x)
subsets n (x : xs) = map (x :) (subsets (n - 1) xs) ++ subsets n xs
melpomene
  • 84,125
  • 8
  • 85
  • 148
  • 1
    It's interesting to note that if the order of appearance of `subsets 0 _ = [[]]` and `subsets _ [] = []` changes it breaks. – Redu Dec 22 '19 at 19:50
9

Using Data.List:

import Data.List
combinations k ns = filter ((k==).length) $ subsequences ns

Reference: 99 Haskell Problems

There are quite a few interesting solutions in the reference, I just picked a concise one.

cmaher
  • 5,100
  • 1
  • 22
  • 34
dopamane
  • 1,315
  • 2
  • 14
  • 27
  • 3
    [Pointfree](http://pointfree.io/): `combinations = (. subsequences) . filter . (. length) . (==)` – Asone Tuhid Jul 19 '20 at 12:41
  • The totally point-free one hurts my head. I find `combinations k = filter ((k==).length) . subsequences` a lot more readable. – nilo May 01 '23 at 18:46
  • Pointfree is really interesting. I'm actively programming in Haskell right now, and am starting to be able to parse these more complex pointfree expressions more easily. A question of being used to them, then, I guess, like so many other things... – nilo May 06 '23 at 10:07
4

It is not clear to me how concerned you would be about performance.

If it can be of any use, back in 2014, somebody posted some sort of performance contest of various Haskell combinations generating algorithms.

For combinations of 13 items out of 26, execution times varied from 3 to 167 seconds ! The fastest entry was provided by Bergi. Here is the non-obvious (for me at least) source code:

subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
                          in if (n > l) then []
                             else subsequencesBySize xs !! (l-n)
 where
   subsequencesBySize [] = [[[]]]
   subsequencesBySize (x:xs) = let next = subsequencesBySize xs
                               in zipWith (++)
                                    ([]:next)
                                    ( map (map (x:)) next ++ [[]] )                 

More recently, the question has been revisited, in the specific context of picking a few elements from a large list (5 out of 100). In that case, you cannot use something like subsequences [1 .. 100] because that refers to a list whose length is 2100 ≃ 1.26*1030. I submitted a state machine based algorithm which is not as Haskell-idiomatic as I would like, but is reasonably efficient for that sort of situations, something around 30 clock cycles per output item.

Side note: Using multisets to generate combinations ?

Also, there is a Math.Combinatorics.Multiset package available. Here is the documentation. I have only briefly tested it, but it can be used to generate combinations.

For example, the set of all combinations of 3 elements out of 8 are just like the "permutations" of a multiset with two elements (Present and Absent) with respective multiplicities of 3 and (8-3)=5.

Let us use the idea to generate all combinations of 3 elements out of 8. There are (876)/(321) = 336/6 = 56 of them.

*L M Mb T MS> import qualified Math.Combinatorics.Multiset as MS
*Math.Combinatorics.Multiset L M Mb T MS> pms = MS.permutations
*Math.Combinatorics.Multiset L M Mb T MS> :set prompt "λ> "
λ> 
λ> pms38 = pms $ MS.fromCounts [(True, 3), (False,5)]
λ> 
λ> length pms38
56
λ>
λ> take 3 pms38
[[True,True,True,False,False,False,False,False],[True,True,False,False,False,False,False,True],[True,True,False,False,False,False,True,False]]
λ> 
λ> str = "ABCDEFGH"
λ> combis38 = L.map fn pms38 where fn mask = L.map fst $ L.filter snd (zip str mask)
λ> 
λ> sort combis38
["ABC","ABD","ABE","ABF","ABG","ABH","ACD","ACE","ACF","ACG","ACH","ADE","ADF","ADG","ADH","AEF","AEG","AEH","AFG","AFH","AGH","BCD","BCE","BCF","BCG","BCH","BDE","BDF","BDG","BDH","BEF","BEG","BEH","BFG","BFH","BGH","CDE","CDF","CDG","CDH","CEF","CEG","CEH","CFG","CFH","CGH","DEF","DEG","DEH","DFG","DFH","DGH","EFG","EFH","EGH","FGH"]
λ>
λ> length combis38
56
λ>

So functionally at least, the idea of using multisets to generate combinations works.

jpmarinier
  • 4,427
  • 1
  • 10
  • 23
2

There is no such operation in lib, but you can easily implement it yourself:

import Data.List

main = putStrLn $ show $ myOp 2 [1, 2, 3]

myOp :: Int -> [a] -> [[a]]
myOp 0 _ = []
myOp 1 l = map (:[]) l
myOp c l = concat $ map f $ tails l
    where
        f :: [a] -> [[a]]
        f []     = []
        f (x:xs) = map (x:) $ myOp (c - 1) xs
talex
  • 17,973
  • 3
  • 29
  • 66
2

@melpomene's answer is generic and very concise. That's probably what you see in many places over the internet where a combinationsOf function is required.

However hidden behind the double recursion it does tons of needless recursive calls those can be avoided, yielding a much efficient code. i.e we don't need to make any calls if the length of the list is shorter than k.

I would propose a double termination check.

combinationsOf :: Int -> [a] -> [[a]]
combinationsOf k xs = runner n k xs
                      where
                      n = length xs
                      runner :: Int -> Int -> [a] -> [[a]]
                      runner n' k' xs'@(y:ys) = if k' < n'      -- k' < length of the list
                                                then if k' == 1
                                                     then map pure xs'
                                                     else map (y:) (runner (n'-1) (k'-1) ys) ++ runner (n'-1) k' ys
                                                else pure xs'   -- k' == length of the list.

λ> length $ subsets 10 [0..19] -- taken from https://stackoverflow.com/a/52602906/4543207
184756
(1.32 secs, 615,926,240 bytes)

λ> length $ combinationsOf 10 [0..19]
184756
(0.45 secs, 326,960,528 bytes)

So the above code, while optimized as much as possible, is still inefficient mainly due to double recursion from within. As a rule of thumb, in any algorithm a double recursions should best be avoided or be considered under very careful inspection.

The following algorithm on the other hand, is a very efficient way of doing this job both on speed and memory consumption.

combinationsOf :: Int -> [a] -> [[a]]
combinationsOf k as@(x:xs) | k == 1    = map pure as
                           | k == l    = pure as
                           | k >  l    = []
                           | otherwise = run (l-1) (k-1) as $ combinationsOf (k-1) xs
                             where
                             l = length as

                             run :: Int -> Int -> [a] -> [[a]] -> [[a]]
                             run n k ys cs | n == k    = map (ys ++) cs
                                           | otherwise = map (q:) cs ++ run (n-1) k qs (drop dc cs)
                                           where
                                           (q:qs) = take (n-k+1) ys
                                           dc     = product [(n-k+1)..(n-1)] `div` product [1..(k-1)]

λ> length $ combinationsOf 10 [0..19]
184756
(0.09 secs, 51,126,448 bytes)
Redu
  • 25,060
  • 6
  • 56
  • 76
  • With the second code block in GHCi, `combinationsOf 3 [1 .. 3] == [1, 2, 3, 2, 3, 3]`, which is incorrect, and `combinationsOf 4 [1 .. 3]` froze my machine, I think due to swapping from memory exhaustion. – Chai T. Rex May 14 '21 at 21:42
  • @Chai T. Rex Thank you for heads up. I tried to fix it. However this is not a code for combinations with repetitions so the updated code won't allow `k > l`. – Redu May 15 '21 at 13:09
0

Monadic solution for unique combinations:

cb _ 0 = [[]]
cb xs n = (nxs >>= (\(nx, x) -> (x:) <$> (cb [z | (n,z) <- nxs, n>nx] (n-1)) )) where nxs = zip [1..] xs
Evg
  • 2,978
  • 5
  • 43
  • 58
0

I had the same problem. I tried looking for a solution in the standard library, but couldn't find any. I came up with this:

genCombinations :: Int -> [a] -> [[a]]
genCombinations 0 _ = [[]]
genCombinations n xs =
  genCombinations' [] xs n
  where
    -- \| Generates new combinations by replacing elements in the old combinations (the accumulator)
    -- \| with the first element from the list.
    genCombinations' :: [[a]] -> [a] -> Int -> [[a]]
    genCombinations' acc [] _ = acc
    genCombinations' acc (x : xs) n =
      -- replace elements in lists from accumulator, and add a combination made of the first element
      let newCombinations = concatMap (replaceElems x) acc ++ [replicate n x]
       in newCombinations ++ genCombinations' (acc ++ newCombinations) xs n

    replaceElems :: a -> [a] -> [[a]]
    replaceElems x xs =
      replaceElems' x (length xs) 0 xs

    replaceElems' :: a -> Int -> Int -> [a] -> [[a]]
    replaceElems' _ _ _ [] = []
    -- count - how many elements were replaced before
    -- n - total length of the combination
    replaceElems' x n count [y]
      | count == n - 1 = [[y]] -- all prievous elements were replaced, don't replace now
      | count == 0 = [[x]] -- no elements were replaced, replace now
      | otherwise = [[x], [y]]
    replaceElems' x n count (y : ys) =
      -- don't replace the element, don't increment the counter
      map (y :) (replaceElems' x n count ys)
        -- replace the element, increment the counter
        ++ map (x :) (replaceElems' x n (count + 1) ys)

Works by keeping a list of all generated combinations in the accumulator. When processing a next element from the input list (x), it creates new combinations, by replacing element's of lists in the accumulator with x in almost all possible ways. Then it adds a new combination, made of x-s only.

When replacing elements from previously generated combinations, it doesn't replace all their elements, or none, so no combination is generated twice.

What's cool is that it is lazy with respect to the input list's length, so it can generate all combinations of length n, even from an infinite list.

aurelia
  • 493
  • 8
  • 12