5

I was doing a few of the 99 Haskell Problems earlier and I thought that exercise 27 ("write a function to enumerate the possible combinations") was interesting as it's a simple concept and it lends itself to multiple implementations.

I was curious about relative efficiency so I decided to run a couple of different implementations - results are in the table below. (For reference: Emacs bash ansi-term in LXDE (Ubuntu 14.04) running on VirtualBox; Thinkpad X220; 8gb RAM, i5 64bit 2.4ghz.)


TL;DR:

(i) Why are combination-generating techniques #7 and #8 (from the table below; code included at bottom of post) so much faster than the rest?
(ii) Also, what do the figures in the Bytes column actually represent?


(i) It's odd because function #7 works by filtering the powerset (which is waaaay larger than the combinations list); I suspect this is laziness at work, i.e., that this is the function which is most effectively exploiting the fact that we've only asked for the length of the list and not the list itself. (Also, its 'memory usage' is lower than that of the other functions, but, then again, I'm not sure exactly what memory-related stat is being shown.)

Regarding function #8: kudos to Bergi for that ridiculously fast implementation and thanks to user5402 for suggesting the addition. Still trying to wrap my ahead around the speed difference of this one.

(ii) The figures in the Bytes column are reported by GHCi after running the :set +s command; they clearly don't represent max memory usage as I only have ~25gb of RAM + free HD space.)?

enter image description here

Code:

import Data.List

--algorithms to generate combinations
--time required to compute the following: length $ 13 "abcdefghijklmnopqrstuvwxyz"

--(90.14 secs, 33598933424 bytes)
combDC1 :: (Eq a) => Int -> [a] -> [[a]]
combDC1 n xs = filter (/= []) $ combHelper n n xs []

combHelper :: Int -> Int -> [a] -> [a] -> [[a]]
combHelper n _ []        chosen           = if length chosen == n
                                               then [chosen]
                                               else [[]]
combHelper n i remaining chosen
  | length chosen == n = [chosen]
  | n - length chosen > length remaining = [[]]                     
  | otherwise = combHelper n (i-1) (tail remaining) ((head remaining):chosen) ++
                combHelper n i (tail remaining) chosen

--(167.63 secs, 62756587760 bytes)
combSoln1 :: Int -> [a] -> [([a],[a])]
combSoln1 0 xs = [([],xs)]
combSoln1 n [] = []
combSoln1 n (x:xs) = ts ++ ds
  where
    ts = [ (x:ys,zs) | (ys,zs) <- combSoln1 (n-1) xs ]
    ds = [ (ys,x:zs) | (ys,zs) <- combSoln1 n     xs ]

--(71.40 secs,  30480652480 bytes)
combSoln2 :: Int -> [a] -> [[a]]
combSoln2 0 _  = [ [] ]
combSoln2 n xs = [ y:ys | y:xs' <- tails xs
                           , ys <- combSoln2 (n-1) xs']

--(83.75 secs, 46168207528 bytes)
combSoln3 :: Int -> [a] -> [[a]]
combSoln3 0 _  = return []
combSoln3 n xs = do
  y:xs' <- tails xs
  ys <- combSoln3 (n-1) xs'
  return (y:ys)

--(92.34 secs, 40541644232 bytes)
combSoln4 :: Int -> [a] -> [[a]]
combSoln4 0 _ = [[]]
combSoln4 n xs = [ xs !! i : x | i <- [0..(length xs)-1] 
                               , x <- combSoln4 (n-1) (drop (i+1) xs) ]

--(90.63 secs, 33058536696 bytes)
combSoln5 :: Int -> [a] -> [[a]]
combSoln5 _ [] = [[]]
combSoln5 0 _  = [[]]
combSoln5 k (x:xs) = x_start ++ others
    where x_start = [ x : rest | rest <- combSoln5 (k-1) xs ]
          others  = if k <= length xs then combSoln5 k xs else []


--(61.74 secs, 33053297832 bytes)                                                                         
combSoln6 :: Int -> [a] -> [[a]]
combSoln6 0 _ = [[]]
combSoln6 _ [] = []
combSoln6 n (x:xs) = (map (x:) (combSoln6 (n-1) xs)) ++ (combSoln6 n xs)


--(8.41 secs, 10785499208 bytes)
combSoln7 k ns = filter ((k==).length) (subsequences ns)


--(3.15 secs, 2889815872 bytes)
subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
                          in if n>l then [] else subsequencesBySize xs !! (l-n)
 where
   subsequencesBySize [] = [[[]]]
   subsequencesBySize (x:xs) = let next = subsequencesBySize xs
                               in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])                 
iceman
  • 2,020
  • 2
  • 17
  • 24
  • 1
    The very short, almost certainly incomplete answer: critically, both 7 and 8 memoizes all the previous results - whereas the others recompute things multiple times - notice they *all* have multiple recursive calls; and are sufficiently lazy - whereas the others may force "intermediate" results with length and the like. – user2407038 Nov 04 '14 at 05:54
  • 7 is not fast, try this: `combSoln7 2 [1..30]`; it looks fast because 26!/(13! * 13!) = 10400600 and this is only 6 times smaller, than 2^26 = 67108864. Also, 6 can be faster, than 8, look at http://ideone.com/TdCWiK and http://ideone.com/ojnrB3 (but I don't know, which optimizations ideone.com perform). – effectfully Nov 04 '14 at 20:43
  • Perhaps you may like to add [this algorithm](https://stackoverflow.com/a/59932616/4543207) to your tests. It's solving C(26,13) as fast as the best solution above but the result is in lexicographical order. However it's much faster and memory efficient when tested against C(100,5). – Redu Jan 27 '20 at 14:03
  • 1
    unfortunately, this comparison is not *that* useful, since performance-critical code is usually compiled with optimizations. GHCi however is not doing that and hence real performance will vary wildly. – zabeltech May 28 '21 at 07:22

2 Answers2

2

You should also test the algorithm found in this SO answer:

subsequences of length n from list performance

subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs = let l = length xs
                          in if n>l then [] else subsequencesBySize xs !! (l-n)
 where
   subsequencesBySize [] = [[[]]]
   subsequencesBySize (x:xs) = let next = subsequencesBySize xs
                             in zipWith (++) ([]:next) (map (map (x:)) next ++ [[]])

On my machine I get the following timing and memory usage from ghci:

ghci> length $ combSoln7   13 "abcdefghijklmnopqrstuvwxyz"
10400600
(13.42 secs, 10783921008 bytes)

ghci> length $ subsequencesOfSize  13 "abcdefghijklmnopqrstuvwxyz"
10400600
(6.52 secs, 2889807480 bytes)
Community
  • 1
  • 1
ErikR
  • 51,541
  • 9
  • 73
  • 124
1
fact :: (Integral a) => a -> a
fact n = product [1..n]

ncombs n k = -- to evaluate number of combinations
    let n' = toInteger n
        k' = toInteger k
    in div (fact n') ((fact k') * (fact (n' - k')))

combinations :: Int -> [a] -> [[a]]
combinations 0 xs = [[]]
combinations 1 xs = [[x] | x <- xs]
combinations n xs =
    let ps = reverse [0..n - 1]
        inc (p:[])
            | pn < length xs = pn:[]
            | otherwise = p:[]
            where pn = p + 1
        inc (p:ps)
            | pn < length xs = pn:ps
            | (head psn) < length xs = inc ((head psn):psn)
            | otherwise = (p:ps)
            where pn = p + 1
                  psn = inc ps
        amount = ncombs (length xs) n
        pointers = take (fromInteger amount) (iterate inc ps)
        c' xs ps = map (xs!!) (reverse ps)
    in map (c' xs) pointers

I am learning Haskell and found a comparably fast implementation. I had a hard time with the type system with some functions requiring Ints and some fractional numbers and some Integers. On my computer the fastest solution presented here takes about 6,1 seconds to run and mine takes 3,5 to 2,9 seconds.

jogo3000
  • 21
  • 3