Understanding the Limitations of Lazy Evaluation (Sieve of Eratosthenes)

Question

In the Haskell Wiki article on prime numbers, the following implementation of the Sieve of Eratosthenes is described:

primes = 2 : 3 : minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- tail primes])

When doing...

primes !! 2

... how does Haskell recognize in this specific situation not to try all p's in the tail of primes (a.k.a [3..]), but instead only does a set minus with 3?

In other words: how does Haskell know that any of the other p's (or multiples thereof) will not match 5, which is the eventual answer. Is there a good rule of thumb to know when the compiler is smart enough to handle infinite cases?

Haskell itself wouldn't know. but the implementation of `minus` might. Given that in that code fragment we have that `minus`, `union` and `unionAll` come from `Data.List.Ordered`, that means that at some point the fact that the infinite list you use is ordered is used to short-circuit the computation. What the runtime will guarantee is that as soon as it knows the 3rd element of the list (that is, it knows the list is of the form `a : b : c : rest`), it will stop the computation of the rest of the list. — Mor A., Sep 20 '18 at 23:38

score 3 · Accepted Answer · answered Sep 20 '18 at 23:47

(!!) only demands that primes be evaluated enough to find out that there are at least 3 elements, and what the third element is. To get that third element, we need to start evaluating the call to minus.

minus assumes that both its arguments are sorted. This is described in the docs, and is satisfied in the definition of primes. The first comparison minus performs is between 5 and 9, and this shows that 5 is the first element of the result. In the definition of minus this is the case LT -> x : loop xs (y:ys).

(!!) now has the third element of primes, so evaluation does not continue in primes or minus or unionAll. This back-and-forth between evaluation of subexpressions and pattern matching in the outer expressions is lazy evaluation.

Ah thank you that clears things up! To clarify if I understand correctly, does this mean that unionAll is able to short circuit calculation early because it knows that the first list is the _least_ (documentation: "the assumption that the heads of the inner lists are sorted"). Also am I using the proper terminology to describe this? Thank you again! — Wilson, Sep 21 '18 at 00:11

Will Ness · Answer 2 · 2018-09-21T08:46:25.950

Actually, the crux is in the implementation of unionAll. minus just pulls elements from its right argument one by one unawares (it assumes both its arguments are non-decreasing of course).

First, let's re-write it as

primes = 2 : ps 
ps     = 3 : t
t      = minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- ps])

-- primes !! 2 == ps !! 1 == head t

       = minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- (3 : t)])
       = minus [5,7..] (unionAll ([9, 15..] : [[p*p, p*p+2*p..] | p <- t]))

Now unionAll is smart: it relies on the assumption (here, fact) that in unionAll xs it holds that (map head xs) are non-decreasing.

As such, it knows it does not have to compare 9 with anything! So it just produces it unconditionally (you can consult its definition to check it for yourself):

       = minus [5,7..] 
               (9 : union [15, 21..] (unionAll ........))

Thus minus has something to compare the 5 and the 7 with, and produces:

       = 5 : 7 : minus [9,11..] 
                       (9 : union [15, 21..] (unionAll ........))

All this from knowing just the first odd prime, 3.

Understanding the Limitations of Lazy Evaluation (Sieve of Eratosthenes)

2 Answers2