Why is filter in front of foldLeft slow in Scala?

Question

I wrote an answer to the first Project Euler question:

Add all the natural numbers below one thousand that are multiples of 3 or 5.

The first thing that came to me was:

(1 until 1000).filter(i => (i % 3 == 0 || i % 5 == 0)).foldLeft(0)(_ + _)

but it's slow (it takes 125 ms), so I rewrote it, simply thinking of 'another way' versus 'the faster way'

(1 until 1000).foldLeft(0){
    (total, x) =>
        x match {
            case i if (i % 3 == 0 || i % 5 ==0) => i + total // Add
            case _ => total //skip
        }
}

This is much faster (only 2 ms). Why? I'm guess the second version uses only the Range generator and doesn't manifest a fully realized collection in any way, doing it all in one pass, both faster and with less memory. Am I right?

Here the code on IdeOne: http://ideone.com/GbKlP

How did you measure that the code is "orders of magnitude faster"? On my *extremely* old, *extremely* slow laptop, in the Scala *interpreter*, the version you claim is "orders of magnitude slower" takes less than 300 µs (that's *micro* seconds), so how long does the *fast* version take? Does it go backwards in time? Most high-performance JVMs need about 5 *seconds* just to warm up their caches and stuff, before they even reach full speed. (The C2 compiler in the HotSpot JVM, which is the optimizing compiler, doesn't even compile a method until it has run 20000 times.) — Jörg W Mittag, Jan 25 '11 at 16:22
The second version seems to be around 3 times faster (by my totally unscientific measurement using `(1 to 10000000)` in the REPL). I wouldn't call that "orders of magnitude", but still. — Knut Arne Vedaa, Jan 25 '11 at 16:33
@Jörg: you can see the running times at his ideone link, but I'll edit that information into the question so it's not lost if the ideone link ever goes away. — Ken Bloom, Jan 25 '11 at 16:57
@Jörg: please post your comment about the HotSpot JVM as an answer. — Ken Bloom, Jan 25 '11 at 17:00
You can learn about this and many other noteworthy things from this great document: http://www.scala-lang.org/docu/files/collections-api/collections.html Everybody should read this before coding. I know I should have. — Raphael, Jan 25 '11 at 17:12
Time of the second version here is only two third of the first version, so it doesn't even achieve twice as fast status. — Daniel C. Sobral, Jan 25 '11 at 17:22
Seeing the code on IDEOne I noted some serious problems. There's a single iteration of each code, and the "slower" version comes first. In practice, the inner loop of the slower version is "training" the JVM for the common functions that will be later called by the "faster" version. See here, where I just inverted the two pieces of code, which one is faster: http://ideone.com/7RrlQ — Daniel C. Sobral, Jan 25 '11 at 18:07
@Jörg Hah! Look at my link above. The way he did it, the "slower" version actually trained the JVM for the "faster" one. Invert the two algorithms, and see the faster become slower and vice versa! — Daniel C. Sobral, Jan 25 '11 at 18:09

Daniel C. Sobral · Accepted Answer · 2011-01-25T20:41:58.020

The problem, as others have said, is that filter creates a new collection. The alternative withFilter doesn't, but that doesn't have a foldLeft. Also, using .view, .iterator or .toStream would all avoid creating the new collection in various ways, but they are all slower here than the first method you used, which I thought somewhat strange at first.

But, then... See, 1 until 1000 is a Range, whose size is actually very small, because it doesn't store each element. Also, Range's foreach is extremely optimized, and is even specialized, which is not the case of any of the other collections. Since foldLeft is implemented as a foreach, as long as you stay with a Range you get to enjoy its optimized methods.

(_: Range).foreach:

@inline final override def foreach[@specialized(Unit) U](f: Int => U) {
    if (length > 0) {
        val last = this.last
        var i = start
        while (i != last) {
            f(i)
            i += step
        }
        f(i)
    }
}

(_: Range).view.foreach

def foreach[U](f: A => U): Unit = 
    iterator.foreach(f)

(_: Range).view.iterator

override def iterator: Iterator[A] = new Elements(0, length)

protected class Elements(start: Int, end: Int) extends BufferedIterator[A] with Serializable {
  private var i = start

  def hasNext: Boolean = i < end

  def next: A = 
    if (i < end) {
      val x = self(i)
      i += 1
      x
    } else Iterator.empty.next

  def head = 
    if (i < end) self(i) else Iterator.empty.next

  /** $super
   *  '''Note:''' `drop` is overridden to enable fast searching in the middle of indexed sequences.
   */
  override def drop(n: Int): Iterator[A] =
    if (n > 0) new Elements(i + n, end) else this

  /** $super
   *  '''Note:''' `take` is overridden to be symmetric to `drop`.
   */
  override def take(n: Int): Iterator[A] =
    if (n <= 0) Iterator.empty.buffered
    else if (i + n < end) new Elements(i, i + n) 
    else this
}

(_: Range).view.iterator.foreach

def foreach[U](f: A =>  U) { while (hasNext) f(next()) }

And that, of course, doesn't even count the filter between view and foldLeft:

override def filter(p: A => Boolean): This = newFiltered(p).asInstanceOf[This]

protected def newFiltered(p: A => Boolean): Transformed[A] = new Filtered { val pred = p }

trait Filtered extends Transformed[A] {
  protected[this] val pred: A => Boolean 
  override def foreach[U](f: A => U) {
    for (x <- self)
      if (pred(x)) f(x)
  }
  override def stringPrefix = self.stringPrefix+"F"
}

For what it's worth... This gets my vote for best answer so far. — Kevin Wright, Jan 25 '11 at 18:22
Agreed. I gave it to your first Kevin, and I apologize for changing it, but it is this answer that has helped me understand what's going on. — andyczerwonka, Jan 25 '11 at 19:44
@arcticpenguin - Quite right too, I wouldn't have it any other way! — Kevin Wright, Jan 25 '11 at 20:44

Kevin Wright · Answer 2 · 2011-01-25T17:09:00.340

12

Try making the collection lazy first, so

(1 until 1000).view.filter...

instead of

(1 until 1000).filter...

That should avoid the cost of building an intermediate collection. You might also get better performance from using sum instead of foldLeft(0)(_ + _), it's always possible that some collection type might have a more efficient way to sum numbers. If not, it's still cleaner and more declarative...

edited Jan 25 '11 at 17:09

answered Jan 25 '11 at 17:03

Kevin Wright

49,540
9
105
155

Beat me to it. Nice touch with `sum`. – Raphael Jan 25 '11 at 17:10
Once I warm up the JVM, the difference is less than the original post. Adding the view brings it down even more, to where they're basically the same. – andyczerwonka Jan 25 '11 at 17:54
6

I don't know about you, but my tests on the trunk I have here makes it actually slower than without the `view`. – Daniel C. Sobral Jan 25 '11 at 17:55
@arcticpenguin Curious, I'm just not seeing that here. In fact, the version with `view` is slower here than without. – Daniel C. Sobral Jan 25 '11 at 17:59
admittedly, I didn't test on this exact problem, but I've found that `view` usually helps. Of course, if the jvm is cold then all bets are off... – Kevin Wright Jan 25 '11 at 18:13
@arcticpenguin - are you sure you're warming what you think you're warming? Move the test code into a dedicated method so you can be sure that the code under test really has been optimized. – Kevin Wright Jan 25 '11 at 18:17
0ms vs 0ms? That's not just warm, it's positively sizzling! – Kevin Wright Jan 25 '11 at 20:52

sblundy · Answer 3 · 2011-01-25T16:23:46.987

3

Looking through the code, it looks like filter does build a new Seq on which the foldLeft is called. The second skips that bit. It's not so much the memory, although that can't but help, but that the filtered collection is never built at all. All that work is never done.

Range uses TranversableLike.filter, which looks like this:

def filter(p: A => Boolean): Repr = {
  val b = newBuilder
  for (x <- this) 
    if (p(x)) b += x
  b.result
}

I think it's the += on line 4 that's the difference. Filtering in foldLeft eliminates it.

edited Jan 25 '11 at 16:23

answered Jan 25 '11 at 16:13

sblundy

60,628
22
121
123

1

Interesting. The GHC Haskell compiler would probably perform some kind of stream fusion here, and essentially turn the first version into the second one on its own. Unfortunately, stuff like this is really hard to achieve for an impure language like Scala (especially if you add dynamic method dispatch into the mix). The compiler would probably have to prove that both the blocks supplied to `filter` and `foldLeft` are referentially transparent, as well as prove that there's no funny subclassing going on. – Jörg W Mittag Jan 25 '11 at 16:28
@Jörg W Mittag: True. One wonders if calling toStream on the Range would help. Probably not. But we're well into micro-optimization land here. – sblundy Jan 25 '11 at 16:40

Raphael · Answer 4 · 2011-01-25T18:55:46.540

1

filter creates a whole new sequence on which then foldLeft is called. Try:

(1 until 1000).view.filter(i => (i % 3 == 0 || i % 5 == 0)).reduceLeft(_+_)

This will prevent said effect, merely wrapping the original thing. Exchanging foldLeft with reduceLeft is only cosmetic (in this case).

edited Jan 25 '11 at 18:55

answered Jan 25 '11 at 17:09

Raphael

9,779
5
63
94

2

Note: exchanging `foldLeft` with `reduceLeft` is only cosmetic _here_ because you happen to know _a priori_ that the list is non-empty. In general, you need to fold to avoid a potential exception. – Aaron Novstrup Jan 25 '11 at 18:51

score 1 · Answer 5 · answered Jan 30 '11 at 04:28

1

Now the challenge is, can you think of a yet more efficient way? Not that your solution is too slow in this case, but how well does it scale? What if instead of 1000, it was 1000000000? There is a solution that could compute the latter case just as quickly as the former.

answered Jan 30 '11 at 04:28

Ray

4,829
4
28
55

2

`def arithProg(a:Int, d:Int, n:Int): Long = n * (2 * a + (n - 1) * d.toLong) / 2; def find(n: Int):Long = arithProg (3, 3, n/3) + arithProg (5, 5, n/5) - arithProg (15, 15, n/15); println(find(1000000000 - 1))` What do I win? – Luigi Plinge Jun 08 '11 at 04:55

Why is filter in front of foldLeft slow in Scala?

5 Answers5