8

I'm new to Scala and trying to figure out the best way to filter & map a collection. Here's a toy example to explain my problem.

Approach 1: This is pretty bad since I'm iterating through the list twice and calculating the same value in each iteration.

val N = 5
val nums = 0 until 10
val sqNumsLargerThanN = nums filter { x: Int => (x * x) > N } map { x: Int => (x * x).toString }

Approach 2: This is slightly better but I still need to calculate (x * x) twice.

val N = 5
val nums = 0 until 10
val sqNumsLargerThanN = nums collect { case x: Int if (x * x) > N => (x * x).toString }

So, is it possible to calculate this without iterating through the collection twice and avoid repeating the same calculations?

royhowie
  • 11,075
  • 14
  • 50
  • 67
Can Bal
  • 1,533
  • 1
  • 13
  • 26

9 Answers9

7

Could use a foldRight

nums.foldRight(List.empty[Int]) {
  case (i, is) =>
    val s = i * i
    if (s > N) s :: is else is
  }

A foldLeft would also achieve a similar goal, but the resulting list would be in reverse order (due to the associativity of foldLeft.

Alternatively if you'd like to play with Scalaz

import scalaz.std.list._
import scalaz.syntax.foldable._

nums.foldMap { i =>
  val s = i * i
  if (s > N) List(s) else List()
}
adelbertc
  • 7,270
  • 11
  • 47
  • 70
  • Note that with the default `foldRight` you will overflow your stack if your list is more than a thousand or so elements long. Also, the Scalaz version doesn't have any advantage over a `flatMap`. – Rex Kerr Jun 16 '15 at 00:48
5

The typical approach is to use an iterator (if possible) or view (if iterator won't work). This doesn't exactly avoid two traversals, but it does avoid creation of a full-sized intermediate collection. You then map first and filter afterwards and then map again if needed:

xs.iterator.map(x => x*x).filter(_ > N).map(_.toString)

The advantage of this approach is that it's really easy to read and, since there are no intermediate collections, it's reasonably efficient.

If you are asking because this is a performance bottleneck, then the answer is usually to write a tail-recursive function or use the old-style while loop method. For instance, in your case

def sumSqBigN(xs: Array[Int], N: Int): Array[String] = {
  val ysb = Array.newBuilder[String]
  def inner(start: Int): Array[String] = {
    if (start >= xs.length) ysb.result
    else {
      val sq = xs(start) * xs(start)
      if (sq > N) ysb += sq.toString
      inner(start + 1)
    }
  }
  inner(0)
}

You can also pass a parameter forward in inner instead of using an external builder (especially useful for sums).

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • Hi Rex - what do you mean by it doesn't exactly avoid two traversals? – sourcedelica Jun 15 '15 at 23:43
  • @sourcedelica - Each iterator, when walking itself through the list, also (necessarily) walks the previous iterators through. So they all traverse in lock-step, but if you map, then filter, then map, you actually have next/hasNext calls nested three deep. – Rex Kerr Jun 16 '15 at 00:46
4

I have yet to confirm that this is truly a single pass, but:

  val sqNumsLargerThanN = nums flatMap { x =>
    val square = x * x
    if (square > N) Some(x) else None
  }
triggerNZ
  • 4,221
  • 3
  • 28
  • 34
  • I wanna ask will the loading of wrapping each element for a Option Layer be slighter than calculating x * x twice? The Option object creation cost can be ignored? (I'm new to Scala from C++.) – Chen OT Jun 15 '15 at 01:42
  • 1
    To answer your question directly, no, the Option allocation is not free. It is cheap though. The JVM GC has gotten very good over the years at allocating and collecting small objects in loops. So while not free, this is almost never the place where I would start optimising. – triggerNZ Jun 15 '15 at 01:47
  • 2
    Furthermore, I should mention that while this is a fun puzzle to solve, trying to minimise the number of passes over a collection in the world of functional programming is usually not the best way of gaining performance. These things are common in the C/C++ world and much less common on the JVM. Having said that, let's assume your collection is huge, say, 8GB. Then you do really want to pass only once, and I would stick with collect, or the use of lazy collections. The double multiplication will be optimised away by the JIT – triggerNZ Jun 15 '15 at 01:50
  • 1
    It is a single pass, but it's kind of cheating because you create a second collection's worth of stuff as Options. (And a third, even, because you can't `flatMap` it without an implicit conversion to an `Iterable`.) So it's usually more heavyweight than an `iterator`, even if in some sense it technically is single pass in a way that `iterator`s are not. – Rex Kerr Jun 15 '15 at 02:07
  • Yeah it did feel like cheating :) – triggerNZ Jun 15 '15 at 02:11
3

You can use collect which applies a partial function to every value of the collection that it's defined at. Your example could be rewritten as follows:

val sqNumsLargerThanN = nums collect {
    case (x: Int) if (x * x) > N => (x * x).toString
}
Gavin Schulz
  • 4,548
  • 4
  • 23
  • 32
  • Why did someone down-vote this answer? `collect` seems like a very idiomatic way to do this. – Michael Zajac Jun 15 '15 at 12:42
  • Isn't this exactly the same as my "Approach 2"? – Can Bal Jun 16 '15 at 00:38
  • Yes, it is the same as Approach#2 above, and going by the definition of _collect_ , this one seems perfectly reasonable to me; it says exactly what it does. This is not to say other approaches elucidated above are better or worse though. – Nirmalya Jul 13 '15 at 02:32
3

A very simple approach that only does the multiplication operation once. It's also lazy, so it will be executing code only when needed.

nums.view.map(x=>x*x).withFilter(x => x> N).map(_.toString)

Take a look here for differences between filter and withFilter.

Community
  • 1
  • 1
marios
  • 8,874
  • 3
  • 38
  • 62
  • This is very interesting. In the thread you linked to, there is a comment "I don't think you're supposed to use withFilter yourself (apart from implicitly within for-expressions)". Is there a reason not to use `withFilter` – Can Bal Jun 16 '15 at 00:35
  • I use `filter` only when I want to create a new collection to use further down the road. If I just want a filter as an intermediate step of a pipeline of operations, I always use `withFilter`. – marios Jun 16 '15 at 03:57
2

Consider this for comprehension,

  for (x <- 0 until 10; v = x*x if v > N) yield v.toString

which unfolds to a flatMap over the range and a (lazy) withFilter onto the once only calculated square, and yields a collection with filtered results. To note one iteration and one calculation of square is required (in addition to creating the range).

elm
  • 20,117
  • 14
  • 67
  • 113
0

You can use flatMap.

val sqNumsLargerThanN = nums flatMap { x =>
  val square = x * x
  if (square > N) Some(square.toString) else None
}

Or with Scalaz,

import scalaz.Scalaz._

val sqNumsLargerThanN = nums flatMap { x =>
  val square = x * x
  (square > N).option(square.toString)
}

The solves the asked question of how to do this with one iteration. This can be useful when streaming data, like with an Iterator.

However...if you are instead wanting the absolute fastest implementation, this is not it. In fact, I suspect you would use a mutable ArrayList and a while loop. But only after profiling would you know for sure. In any case, that's for another question.

Paul Draper
  • 78,542
  • 46
  • 206
  • 285
0

Using a for comprehension would work:

val sqNumsLargerThanN = for {x <- nums if x*x > N } yield (x*x).toString

Also, I'm not sure but I think the scala compiler is smart about a filter before a map and will only do 1 pass if possible.

Ramón J Romero y Vigil
  • 17,373
  • 7
  • 77
  • 125
-2

I am also beginner did it as follows

 for(y<-(num.map(x=>x*x)) if y>5 ) { println(y)}
gauri
  • 1
  • 1
  • 1