0

Here we need to get all subsequences of the given length. How to calculate the asymptotic complexity of the given function?

import Data.List

subsequencesOfSize l n = [x | x <- subsequences l, length x == n]

How many "iterations" will it be? Can we speak about "iterations". Any optimizations? GHC 7.10.2. How compiler or programmer can improve this?

Erik Kaplun
  • 37,128
  • 15
  • 99
  • 111
Dmitry
  • 337
  • 2
  • 12
  • Are you interested in the theoretical complexity or specifically compiled output? – sinelaw Nov 03 '15 at 10:49
  • 2
    Generating a huge list and then filtering out some potentially small parts from it is hardly efficient. There could be a better approach, i.e. a different algorithm, which will prune long subsequences as early as possible. – chi Nov 03 '15 at 10:59
  • 3
    @SDmitry: reversing the argument order (i.e. `subsequencesOfSize n l`) would make for a more idiomatic design, just for the record. – Erik Kaplun Nov 03 '15 at 11:03
  • It can be ineffitient if we can do it without optimisations. But it is a part of my question. Can GHC make such optimizations here after which this approach can be comparable with the early-filtering? And theoretical complexity interesting too. – Dmitry Nov 03 '15 at 11:04
  • 1
    along the lines of dfeuer's suggestion, you might want to check out this: http://stackoverflow.com/a/33016778/247623 — basically you should use asomething like `lengthIsGt` to pre-filter the lists before you call `length` on them, so as to avoid evaluating very long lists. in fact, your current algorithm will loop forever if any of the lists in `l` is infinite. – Erik Kaplun Nov 03 '15 at 11:06
  • I forgot about this "length" characteristic! – Dmitry Nov 03 '15 at 11:16

2 Answers2

1

I think your question contains at least 2 distinct questions, but whether (current versions of) GHC can optimize this

subsequencesOfSize l n = [x | x <- subsequences l, length x == n]

into something like

subsequencesOfSize l n = [x | x <- subsequences l, lengthIsGt n x, length x == n]

is highly doubtable — I'm nowhere near being an expert on optimization, supercompilation etc, but inferring the need for something like lengthIsGt from the use of length here seems highly non-trivial and is unlikely to be part of a compiler's set of generic optimizations.

P.S. actually the version with lengthIsGt will potentially still try to evaluate infinite lists if the condition containing length is evaluated before the condition containing lengthIsGt (but that was pseudocode anyway)

Erik Kaplun
  • 37,128
  • 15
  • 99
  • 111
1

Here is a related SO question which discusses various implementations of subsequencesOfSize:

Haskell: comparison of techniques for generating combinations

and the best performing one which uses memoization:

Haskell: comparison of techniques for generating combinations

Update

Also consider this version that @user3237465 mentions in the comments:

subsequencesOfSize :: Int -> [a] -> [[a]]
subsequencesOfSize n xs | n > length xs = []
subsequencesOfSize n xs = subsequencesBySize xs !! n where
    subsequencesBySize    []  = [[[]]]
    subsequencesBySize (x:xs) = zipWith (++) ([] : map (map (x:)) next) (next ++ [[]])
        where next = subsequencesBySize xs
Community
  • 1
  • 1
ErikR
  • 51,541
  • 9
  • 73
  • 124