Why is this scala prime generation so slow/memory intensive?

Question

I run out of memory while finding the 10,001th prime number.

object Euler0007 {
  def from(n: Int): Stream[Int] = n #:: from(n + 1)
  def sieve(s: Stream[Int]): Stream[Int] = s.head #:: sieve(s.filter(_ % s.head != 0))
  def primes = sieve(from(2))
  def main(args: Array[String]): Unit = {
    println(primes(10001))
  }
}

Is this because after each "iteration" (is this the correct term in this context?) of primes, I increase the stack of functions to be called to get the next element by one?

One solution that I've found on the web which doesn't resort to an iterative solution (which I'd like to avoid to get into functional programming/idiomatic scala) is this (Problem 7):

lazy val ps: Stream[Int] = 2 #:: Stream.from(3).filter(i => ps.takeWhile(j => j * j <= i).forall(i % _ > 0))

From what I can see, this does not lead to this recursion-like way. Is this a good way to do it, or do you know of a better way?

if you tentatively add `@tailrec` to `sieve`, the compiler will tell you that `sieve` "contains a recursive call not in tail position"; so I suspect that's where your stack growth comes from. — 0__, Jul 23 '11 at 17:48

Landei · Accepted Answer · 2011-07-23T20:02:08.353

15

One reason why this is slow is that it isn't the sieve of Eratosthenes. Read http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf for a detailled explanation (the examples are in Haskell, but can be translated directly into Scala).

My old solution for Euler problem #7 wasn't the "true" sieve either, but it seems to work good enough for little numbers:

object Sieve {

    val primes = 2 #:: sieve(3)

    def sieve(n: Int) : Stream[Int] =
          if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
          else n #:: sieve(n + 2)

    def main(args: Array[String]) {
      println(primes(10000)) //note that indexes are zero-based
    }
}

I think the problem with your first version is that you have only defs and no val which collects the results and can be consulted by the generating function, so you always recalculate from scratch.

edited Jul 23 '11 at 20:02

answered Jul 23 '11 at 19:46

Landei

54,104
13
100
195

Thanks for the link, I'll read through that tomorrow. The way I understand the code now though is, that I just stack individual sieves on top of each other. Creating a new one for each prime, then passing that stack of sieves along to get the next prime, thus checking for all multiples of the already found primes through an increasing amount of nesting. Anyway, I'll come back to it tomorrow :) – phant0m Jul 23 '11 at 19:57
Ah, I understand now. Thank you very much for the PDF. Although some of the maths is beyond my current understanding, I understand now why it isn't the real sieve. – phant0m Jul 26 '11 at 10:34

score 8 · Answer 2 · edited May 23 '17 at 12:06

Yes, it is because you "increase the stack of functions to be called to get the next element, by one after each "iteration" " - i.e. add a new filter on top of stack of filters each time after getting each prime. That's way too many filters.

This means that each produced prime gets tested by all its preceding primes - but only those below its square root are really needed. For instance, to get the 10001-th prime, 104743, there will be 10000 filters created, at run-time. But there are just 66 primes below 323, the square root of 104743, so only 66 filters were really needed. All the 9934 others will be there needlessly, taking up memory, hard at work producing absolutely no added value.

This is the key deficiency of that "functional sieve", which seems to have originated in the 1970s code by David Turner, and later have found its way into the SICP book and other places. It is not that it's a trial division sieve (rather than the sieve of Eratosthenes). That's far too remote a concern for it. Trial division, when optimally implemented, is perfectly capable of producing the 10000th prime very fast.

The key deficiency of that code is that it does not postpone the creation of filters to the right moment, and ends up creating far too many of them.

Talking complexities now, the "old sieve" code is O(n²), in n primes produced. The optimal trial division is O(n^1.5/log^0.5(n)), and the sieve of Eratosthenes is O(n*log(n)*log(log(n))). As empirical orders of growth the first is seen typically as ~ n^2, the second as ~ n^1.45 and the third ~ n^1.2.

You can find Python generators-based code for optimal trial division implemented in this answer (2nd half of it). It was originally discussed here dealing with the Haskell equivalent of your sieve function.

Just as an illustration, a "readable pseudocode" :) for the old sieve is

primes = sieve [2..]  where
   sieve (x:xs) = x : sieve [ y | y <- xs, rem y x > 0 ]
                         -- list of 'y's, drawn from 'xs',
                         --         such that (y % x > 0)

and for optimal trial division (TD) sieve, synchronized on primes' squares,

primes = sieve [2..] primes   where
  sieve (x:xs) ps = x : (h ++ sieve [ y | y <- t, rem y p > 0 ] qs)
          where
            (p:qs) = ps     -- 'p' is head elt in 'ps', and 'qs' the rest
            (h,t)  = span (< p*p) xs    -- 'h' are elts below p^2 in 'xs'
                                        -- and 't' are the rest

and for a sieve of Eratosthenes, devised by Richard Bird, as seen in that JFP article mentioned in another answer here,

primes = 2 : minus [3..] 
               (foldr (\p r-> p*p : union [p*p+p, p*p+2*p..] r) [] primes)
                      -- function of 'p' and 'r', that returns 
                      -- a list with p^2 as its head elt, ...

Short and fast. (minus a b is a list a with all the elts of b progressively removed from it; union a b is a list a with all the elts of b progressively added to it without duplicates; both dealing with ordered, non-decreasing lists). foldr is the right fold of a list. Because it is linear this runs at ~ n^1.33, to make it run at ~ n^1.2 the tree-like folding function foldi can be used).

The answer to your second question is also a yes. Your second code, re-written in same "pseudocode",

ps = 2 : [i | i <- [3..], all ((> 0).rem i) (takeWhile ((<= i).(^2)) ps)]

is very similar to the optimal TD sieve above - both arrange for each candidate to be tested by all primes below its square root. While the sieve arranges that with a run-time sequence of postponed filters, the latter definition re-fetches the needed primes anew for each candidate. One might be faster than another depending on a compiler, but both are essentially the same.

And the third is also a yes: the sieve of Eratosthenes is better,

ps = 2 : 3 : minus [5,7..] (unionAll [[p*p, p*p+2*p..] | p <- drop 1 ps])

unionAll = foldi union' []          -- one possible implementation
union' (x:xs) ys = x : union xs ys
   -- unconditionally produce first elt of the 1st arg 
   -- to avoid run-away access to infinite lists

It looks like it can be implemented in Scala too, judging by the similarity of other code snippets. (Though I don't know Scala). unionAll here implements tree-like folding structure (click for a picture and full code) but could also be implemented with a sliding array, working segment by segment along the streams of primes' multiples.

TL;DR: yes, yes, and yes.

Daniel C. Sobral · Answer 3 · 2014-04-07T03:21:17.000

6

FWIW, here's a real Sieve of Eratosthenes:

def sieve(n: Int) = (2 to math.sqrt(n).toInt).foldLeft((2 to n).toSet) { (ps, x) => 
    if (ps(x)) ps -- (x * x to n by x) 
    else ps
}

Here's an infinite stream of primes using a variation on the Sieve of Eratosthenes that preserves its fundamental properties:

case class Cross(next: Int, incr: Int)

def adjustCrosses(crosses: List[Cross], current: Int) = {
  crosses map {
    case cross @ Cross(`current`, incr) => cross copy (next = current + incr)
    case unchangedCross                 => unchangedCross
  }
}

def notPrime(crosses: List[Cross], current: Int) = crosses exists (_.next == current)

def sieve(s: Stream[Int], crosses: List[Cross]): Stream[Int] = {
    val current #:: rest = s

    if (notPrime(crosses, current)) sieve(rest, adjustCrosses(crosses, current))
    else current #:: sieve(rest, Cross(current * current, current) :: crosses)
}

def primes = sieve(Stream from 2, Nil)

This is somewhat difficult to use, however, since each element of the Stream is composed using the crosses list, which has as many numbers as there have been primes up to a number, and it seems that, for some reason, these lists are being kept in memory for each number in the Stream.

For example, prompted by a comment, primes take 6000 contains 56993 would throw a GC exception whereas primes drop 5000 take 1000 contains 56993 would return a result rather fast on my tests.

edited Apr 07 '14 at 03:21

answered Jul 26 '11 at 03:12

Daniel C. Sobral

295,120
86
501
681

+1: Thanks for the implementation! Based on the PDF and the nature of the "actual algorithm", I take it I cannot create a generator of primes without limiting it to "all primes below n" ? – phant0m Jul 26 '11 at 10:42
@phant0m You _can_ do it. One way would be to keep a list of pairs `(next, increment)`, and recompute them at each iteration. I'll try to write it. – Daniel C. Sobral Jul 26 '11 at 15:15
@phant0m There. It is a bit overly verbose, in the hope that it will make the algorithm easier to understand. – Daniel C. Sobral Jul 26 '11 at 15:59
I'm just beginning in Scala so bear with me, but it seems that copy and pasting the code for `sieve` and then `val primes = sieve(99999); sieve contains 56993` yields `false`... – Mullefa Apr 05 '14 at 18:57
@Mullefa I presume you mean `primes contains 56993`? It should return true if you don't hit a memory exception (see comment I added to the end of the answer). Though `contains` is a rather dangerous method here -- if the number isn't a prime, it will never return. – Daniel C. Sobral Apr 05 '14 at 20:21
Yep, sorry - meant `primes contains 56993`. I am calling `contains` on your first version of `sieve` - I don't get a memory error, and get a value of `false` ?! – Mullefa Apr 05 '14 at 20:27
1

@Mullefa Ah, the _first_ version! It happened because of int wrap-around -- 56993 squared does not fit an `Int`, which causes `x * x` on the second line to delete numbers it shouldn't. I've modified the code slightly to ensure that `x` will never be more than required to check whether `n` is prime. – Daniel C. Sobral Apr 07 '14 at 03:23
BTW perhaps the proper fix for your second code, the "crosses" sieve, is to *postpone* the addition of new crosses until a prime's square is seen among the candidates (instead of for each discovered prime) as can be seen [in this Python-based answer of mine](http://stackoverflow.com/a/10733621/849891) and explained [here below](http://stackoverflow.com/a/14821313/849891). – Will Ness Jul 17 '14 at 19:55

Why is this scala prime generation so slow/memory intensive?

3 Answers3

Linked