2

I'm trying to learn haskell and implemented a function conseq that would return a list of consecutive elements of size n.

conseq :: Int -> [Int] -> [[Int]]
conseq n x 
      | n == length(x) = [x]
      | n > length(x) = [x]
      | otherwise = [take n x] ++ (conseq n (drop 1 x))

This works correctly.

> take 5 $ conseq 2 [1..10]  
[[1,2],[2,3],[3,4],[4,5],[5,6]]

However, if I pass [1..] instead of [1..10], the program gets stuck in an infinite loop.

As I understood it, haskell has lazy evaluation so I should still be able to get the same result right? Is it length? Shouldn't the first two conditions evaluate to false as soon as the length becomes greater than n?

What did I misunderstand?

  • 6
    What's `length(x)`, where `x == [1..]`? – ForceBru Nov 16 '20 at 19:22
  • 1
    Is `length` not evaluated lazily? – Tessa Altman Nov 16 '20 at 19:26
  • 3
    Well, how do you evaluate it lazily? Execution can't proceed without knowing the length of this list, so it has to be computed immediately. I'm not completely sure about that though – ForceBru Nov 16 '20 at 19:27
  • [The documentation](https://hackage.haskell.org/package/base-4.14.0.0/docs/GHC-List.html#v:length) says that `list` "returns the length of a _finite_ list" – ForceBru Nov 16 '20 at 19:28
  • `length` is implemented recursively: `length [] = 0; length (_:xs) = 1 + length xs;` (possibly the actual implementation is more efficient, I'm not sure, but that's certainly in principle how it's done). Do you now see the problem of trying to evaluate `length` on an infinite list? – Robin Zigmond Nov 16 '20 at 19:30
  • 1
    @TessaAltman: everything is evaluated lazily, but when it is needed, it *is* evaluated, and then `length` got stuck in an infinite loop. – Willem Van Onsem Nov 16 '20 at 19:30
  • 1
    You might be thinking that `length` only needs to be evaluated up to the point where the comparison would be falsified, but that's not the case. `n > length x`, for example, can only evaluate to `False` one the value of `length x` is known, not after the recursion reaches `n + 1 + length xs`. Laziness doesn't imply short-circuiting. – chepner Nov 16 '20 at 19:33
  • 5
    **If** `length` returned something like a Peano number and **if** the comparisons for Peano numbers were lazily implemented it might be possible to have `n > length x` avoid an infinite loop. But this just isn't the case using `Int` and built-in `length`... – Bakuriu Nov 16 '20 at 19:35
  • @Bakuriu: well for a `genericLength` one can indeed use *Peano* numbers, and thus avoid this, but it would mean that comparisons, calculations, etc. all would take significant amounts of memory and time. – Willem Van Onsem Nov 16 '20 at 19:41
  • You might want to look at `Data.List.unfoldr`, which generates the result productively even when the input is an infinite list. Something like `unfoldr (\xs -> if null xs then Nothing else Just (take n xs, tail xs))`. You might need to tweak it depending on how you want to handle the last few elements of a finite list. – chepner Nov 16 '20 at 22:24
  • By the way, you might also find `Data.List.tails` handy in solving this problem ;) – luqui Nov 17 '20 at 03:43
  • 1
    A perhaps useful point which has not been explicitly mentioned: while it is possible to partially-evaluate a list, there is no such thing as a partially evaluated Int. It is either totally unevaluated, or evaluated all the way to a concrete number. Thus, comparisons against it cannot be lazy. – amalloy Nov 17 '20 at 04:01

2 Answers2

4

The first thing your function does is calculate length(x), so it knows whether it should return [x], [x], or [take n x] ++ (conseq n (drop 1 x))

length counts the number of elements in the list - all the elements. If you ask for the length of an infinite list, it never finishes counting.

user253751
  • 57,427
  • 7
  • 48
  • 90
4

One of the main reasons why using length is not a good idea is because when it has to be evaluated on an infinite list, it will get stuck in an infinite loop.

The good news is however, we don't need length. It would also make the time complexity worse. We can work with two enumerators, one is n-1 places ahead of the other. If this enumerator reaches the end of the list, then we know that the first enumerator still has n-1 elements, and thus we can stop yielding values:

conseq :: Int -> [a] -> [[a]]
conseq n ys = go (drop (n-1) ys) ys
    where go [] _ = []
          go (_:as) ba@(~(_:bs)) = take n ba : go as bs

This gives us thus:

Prelude> conseq 3 [1 ..]
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7],[6,7,8],[7,8,9],[8,9,10],[9,10,11],[10,11,12],[11,12,13],[12,13,14],[13,14,15],[14,15,16],[15,16,17],[16,17,18],[17,18,19],[18,19,20],[19,20,21],[20,21,22],[21,22,23],[22,23,24],[23,24,25],[24,25,26],[25,26,27],…
Prelude> conseq 3 [1 .. 4]
[[1,2,3],[2,3,4]]
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • What exactly is `~ba@(_:bs)` doing? I assume `(_:bs)` is dropping the head of the list and then setting `bs` to the rest like a normal pattern match, but I haven't come across the `~` or `@` notation before – mattematt Nov 17 '20 at 08:19
  • 1
    @mattematt: the `@` is an *as pattern* (https://stackoverflow.com/q/27467650/67579) and `~` is used for an *irrefutable* pattern https://www.haskell.org/tutorial/patterns.html – Willem Van Onsem Nov 17 '20 at 08:26