Scala performance - Sieve

Question

Right now, I am trying to learn Scala . I've started small, writing some simple algorithms . I've encountered some problems when I wanted to implement the Sieve algorithm from finding all all prime numbers lower than a certain threshold .

My implementation is:

import scala.math

object Sieve {

    // Returns all prime numbers until maxNum
    def getPrimes(maxNum : Int) = {
        def sieve(list: List[Int], stop : Int) : List[Int] = {
            list match {
                case Nil => Nil
                case h :: list if h <= stop => h :: sieve(list.filterNot(_ % h == 0), stop)
                case _ => list
            }
        }
        val stop : Int = math.sqrt(maxNum).toInt
        sieve((2 to maxNum).toList, stop)
    }

    def main(args: Array[String]) = {
        val ap = printf("%d ", (_:Int)); 

        // works
        getPrimes(1000).foreach(ap(_))

        // works 
        getPrimes(100000).foreach(ap(_))

        // out of memory
        getPrimes(1000000).foreach(ap(_))

    }
}

Unfortunately it fails when I want to computer all the prime numbers smaller than 1000000 (1 million) . I am receiving OutOfMemory .

Do you have any idea on how to optimize the code, or how can I implement this algorithm in a more elegant fashion .

PS: I've done something very similar in Haskell, and there I didn't encountered any issues .

See http://stackoverflow.com/questions/3737637/why-is-my-algorithm-for-project-euler-problem-12-so-slow — Aaron Novstrup, Jan 04 '11 at 17:23

score 4 · Answer 1 · answered Jan 04 '11 at 17:54

4

The code in question is not tail recursive, so Scala cannot optimize the recursion away. Also, Haskell is non-strict by default, so you can't hardly compare it to Scala. For instance, whereas Haskell benefits from foldRight, Scala benefits from foldLeft.

There are many Scala implementations of Sieve of Eratosthenes, including some in Stack Overflow. For instance:

(n: Int) => (2 to n) |> (r => r.foldLeft(r.toSet)((ps, x) => if (ps(x)) ps -- (x * x to n by x) else ps))

answered Jan 04 '11 at 17:54

Daniel C. Sobral

295,120
86
501
681

I found this particular on-liner rather unintuitive . I will search on google more about writing a decent implementation of the Sieve in Scala . And I will try to dissect the solution you gave me . Thank you for your comment . – Andrei Ciobanu Jan 04 '11 at 18:58
@Andrei: I think that seeing how the [forward pipe operator](http://stevegilham.blogspot.com/2009/01/pipe-operator-in-scala.html) (which isn't part of the standard library) is implemented may make it easier for you to understand that one-liner. – Ken Bloom Jan 04 '11 at 22:58
@Andrei: the one liner basically starts with a set full of all of the numbers from `2 to n`, and then iterates through all `x` from `2 to n` to remove all composite numbers (multiples of x) from the set. – Ken Bloom Jan 04 '11 at 23:00
@Andrei You may research Stack Overflow itself, as this question has been posted here already. – Daniel C. Sobral Jan 05 '11 at 02:49

score 4 · Accepted Answer · answered Jan 04 '11 at 19:43

4

I would go with an infinite Stream. Using a lazy data structure allows to code pretty much like in Haskell. It reads automatically more "declarative" than the code you wrote.

import Stream._

val primes = 2 #:: sieve(3)

def sieve(n: Int) : Stream[Int] =
      if (primes.takeWhile(p => p*p <= n).exists(n % _ == 0)) sieve(n + 2)
      else n #:: sieve(n + 2)

def getPrimes(maxNum : Int) = primes.takeWhile(_ < maxNum)

Obviously, this isn't the most performant approach. Read The Genuine Sieve of Eratosthenes for a good explanation (it's Haskell, but not too difficult). For real big ranges you should consider the Sieve of Atkin.

answered Jan 04 '11 at 19:43

Landei

54,104
13
100
195

What makes you thing that the Sieve of Atkin (SoA) is magic and necessarily better than a genuine Sieve of Eratosthenes (SoE) using the maximum in wheel factorization techniques? BTW, I'm sure you realize that the answer code isn't SoE - it is a reasonably optimized Trial Division. The Sieve of Atkin doesn't have less operations than a maximally wheel factorized practical SoE until up in the billions number ranges, and then its individual operations are considerably more complex so that a practical application never catches up to a practical application of the SoE. – GordonBGood Dec 04 '14 at 10:13
@GordonBGood Wikipedia makes me think so: "This sieve [of Atkin] computes primes up to N using O(N/log log N) operations with only N^(1/2 + o(1)) bits of memory. That is a little better than the sieve of Eratosthenes which uses O(N) operations and O(N^(1/2)*(log log N)/log N) bits of memory." – Landei Dec 04 '14 at 10:18
Asymptotic computation complexity doesn't compare run times; it only estimates increasing run time with range **when compared to itself**. The SoA has a constant factor of about 0.26 times the range in toggling/culling operations (ops), which would make it run at about 63% of the time for their reference SoE at about 0.4 the range culling ops for 1 billion if the ops were the same, but they aren't, as it only runs about 10% faster. Using maximum practical wheel factorization as in primesieve would reduce ops for SoE to 0.25 at no increase in time per op, so SoE is 70% time for a billion. – GordonBGood Dec 04 '14 at 10:58
cont'd: If the time per op for both SoA and SoE were to change at the same rate, SoA would catch the Maximally Wheel Factorized (MWF) SoE at about 1e15, but that would require about 28 TerraBytes of RAM memory for a single array and isn't practical. A page segmented approach needs to be used for both; the MWF SoE adapts much better to that as it uses a constant stride in culling ops, whereas the SoA has a varying stride as a square law, which is impossible to implement efficiently for ranges above about 4 billion. Thus, the MWF SoE whups the SoA even for large ranges. – GordonBGood Dec 04 '14 at 11:20
Sorry, how *could* I mention the SoA and hence insult the superior SoE with MWF? And how *could* I forget about the culling ops? Seriously, the question is now 3 years old, and in the mean time the TO found the primes below 1M for sure, even if he did it manually... – Landei Dec 04 '14 at 16:02
1

Yes, the question is old but still useful as a reference, and many assume as you did that the SoA is the best prime sieve vs. SoE because of the Wikipedia (WP) article and not understanding the (in)significance of asymptotic computational complexity in comparing the actual run times of different algorithms. In fact, the simplifications in the pseudo code in WP produces a constant ratio increase for the SoA of about 50% more ops for any range than the true SoA. The Bernstein and Atkin comparison to SoE was flawed in that SoE didn't MWF, maybe because SoE would then win their comparison. – GordonBGood Dec 04 '14 at 21:10

score 2 · Answer 3 · edited May 23 '17 at 10:25

The following answer is about a 100 times faster than the "one-liner" answer using a Set (and the results don't need sorting to ascending order) and is more of a functional form than the other answer using an array although it uses a mutable BitSet as a sieving array:

object SoE {
  def makeSoE_Primes(top: Int): Iterator[Int] = {
    val topndx = (top - 3) / 2
    val nonprms = new scala.collection.mutable.BitSet(topndx + 1)
    def cullp(i: Int) = {
      import scala.annotation.tailrec; val p = i + i + 3
      @tailrec def cull(c: Int): Unit = if (c <= topndx) { nonprms += c; cull(c + p) }
      cull((p * p - 3) >>> 1)
    }
    (0 to (Math.sqrt(top).toInt - 3) >>> 1).filterNot { nonprms }.foreach { cullp }
    Iterator.single(2) ++ (0 to topndx).filterNot { nonprms }.map { i: Int => i + i + 3 }
  }
}

It can be tested by the following code:

object Main extends App {
  import SoE._
  val top_num = 10000000
  val strt = System.nanoTime()
  val count = makeSoE_Primes(top_num).size
  val end = System.nanoTime()
  println(s"Successfully completed without errors. [total ${(end - strt) / 1000000} ms]")
  println(f"Found $count primes up to $top_num" + ".")
  println("Using one large mutable1 BitSet and functional code.")
}

With the results from the the above as follows:

Successfully completed without errors. [total 294 ms]
Found 664579 primes up to 10000000.
Using one large mutable BitSet and functional code.

There is an overhead of about 40 milliseconds for even small sieve ranges, and there are various non-linear responses with increasing range as the size of the BitSet grows beyond the different CPU caches.

score 1 · Answer 4 · answered Jan 04 '11 at 18:00

1

It looks like List isn't very effecient space wise. You can get an out of memory exception by doing something like this

1 to 2000000 toList

answered Jan 04 '11 at 18:00

Monkey

620
5
4

score 1 · Answer 5 · answered Jan 04 '11 at 22:15

I "cheated" and used a mutable array. Didn't feel dirty at all.

  def primesSmallerThan(n: Int): List[Int] = {
    val nonprimes = Array.tabulate(n + 1)(i => i == 0 || i == 1)
    val primes = new collection.mutable.ListBuffer[Int]
    for (x <- nonprimes.indices if !nonprimes(x)) {
      primes += x
      for (y <- x * x until nonprimes.length by x if (x * x) > 0) {
        nonprimes(y) = true
      }
    }
    primes.toList
  }

Scala performance - Sieve

5 Answers5

Linked