84

Is it always more performant to use withFilter instead of filter, when afterwards applying functions like map, flatmap etc.?

Why are only map, flatmap and foreach supported? (Expected functions like forall/exists as well)

Jämes
  • 6,945
  • 4
  • 40
  • 56
Kigyo
  • 5,668
  • 1
  • 20
  • 24
  • [This part of Scala doc](https://docs.scala-lang.org/tutorials/FAQ/yield.html) also has a detailed explanation. – ohkts11 Mar 16 '19 at 16:07

6 Answers6

127

From the Scala docs:

Note: the difference between c filter p and c withFilter p is that the former creates a new collection, whereas the latter only restricts the domain of subsequent map, flatMap, foreach, and withFilter operations.

So filter will take the original collection and produce a new collection, but withFilter will non-strictly (i.e. lazily) pass unfiltered values through to later map/flatMap/withFilter calls, saving a second pass through the (filtered) collection. Hence it will be more efficient when passing through to these subsequent method calls.

In fact, withFilter is specifically designed for working with chains of these methods, which is what a for comprehension is de-sugared into. No other methods (such as forall/exists) are required for this, so they have not been added to the FilterMonadic return type of withFilter.

Community
  • 1
  • 1
Shadowlands
  • 14,994
  • 4
  • 45
  • 43
  • Hope they still add these methods some day. – Kigyo Oct 27 '13 at 15:38
  • 1
    @Kigyo I don't think you're supposed to use withFilter yourself (apart from implicitly within for-expressions). Use `view` if you want maps / filters to be lazy. – Luigi Plinge Oct 27 '13 at 22:12
  • I see. Whats the exact difference between `view` and `withFilter`? Why isn't view used for `for-loops`? – Kigyo Oct 28 '13 at 02:00
  • @Kigyo `view.filter` will behave identically to `withFilter` on types that provide `view` i.e. collections. But a view is rich interface and for comprehensions are used with many non-collection types e.g. Option, Future. `withFilter` is all that's needed for an efficient translation, so it makes sense to require only that in the contract. – Joe Halliwell Jun 17 '15 at 12:54
  • 5
    Just for reference, I think that [Collections - Tips and Tricks](https://pavelfatin.com/scala-collections-tips-and-tricks/#sequences-rewriting) provides outstanding information. H5s aren't anchored, but you can search for `Don’t create temporary collections` in the linked section. – sthzg Jan 17 '16 at 10:37
  • I'm not sure what you mean by "...pass unfiltered values...". The values passed on are certainly *filtered*, are they not - just lazily passed through. – nclark Jun 12 '16 at 13:45
  • @nclark I just mean values that did not get filtered out, values that passed the filter criterion. – Shadowlands Jun 13 '16 at 23:10
  • 4
    Regarding the explicit use of `withFilter`, Martin Odersky himself uses it explicitly in his Scala courses on Coursera, which I highly recommend. Given that he does so, it may give others comfort with doing so as well, although the difference is typically only 1 character. For example `seq.view filter p` vs. `seq withFilter p`. – Chuck Daniels Aug 15 '16 at 12:44
  • @ChuckDaniels - Thanks. And further, the comment on withFilter as of Scala 2.11 doesn't say not to use it. – Don Branson Feb 23 '18 at 15:49
14

In addition of the excellent answer of Shadowlands, I would like to bring an intuitive example of the difference between filter and withFilter.

Let's consider the following code

val list = List(1, 2, 3)
var go = true
val result = for(i <- list; if(go)) yield {
   go = false
   i
}

Most people expect result to be equal to List(1). This is the case since Scala 2.8, because the for-comprehension is translated into

val result = list withFilter {
  case i => go
} map {
  case i => {
    go = false
    i
  }
}

As you can see the translation converts the condition into a call to withFilter. Prior Scala 2.8, for-comprehension were translated into something like the following:

val r2 = list filter {
  case i => go
} map {
  case i => {
    go = false
    i
  }
}

Using filter, the value of result would be fairly different: List(1, 2, 3). The fact that we're making the go flag false has no effect on the filter, because the filter is already done. Again, in Scala 2.8, this issue is solved using withFilter. When withFilter is used, the condition is evaluated every time an element is accessed inside a map method.

Reference: - p.120 ,Scala in action (covers Scala 2.10), Manning Publications, Milanjan Raychaudhuri - Odersky's thoughts about for-comprehension translation

Jämes
  • 6,945
  • 4
  • 40
  • 56
1

The main reason because forall/exists aren't implemented is that the use case is that:

  • you can lazily apply withFilter to an infinite stream/iterable
  • you can lazily apply another withFilter (and again and again)

To implement forall/exists we need to obtain all the elements, loosing the lazyness.

So for example:

import scala.collection.AbstractIterator

class RandomIntIterator extends AbstractIterator[Int] {
  val rand = new java.util.Random
  def next: Int = rand.nextInt()
  def hasNext: Boolean = true
}

//rand_integers  is an infinite random integers iterator
val rand_integers = new RandomIntIterator

val rand_naturals = 
    rand_integers.withFilter(_ > 0)

val rand_even_naturals = 
    rand_naturals.withFilter(_ % 2 == 0)

println(rand_even_naturals.map(identity).take(10).toList)

//calling a second time we get
//another ten-tuple of random even naturals
println(rand_even_naturals.map(identity).take(10).toList)

Note that ten_rand_even_naturals is still an iterator. Only when we call toList the random numbers will be generated and filtered in chain

Note that map(identity) is equivalent to map(i=>i) and it is used here in order to convert a withFilter object back to the original type (eg a collection , a stream, an iterator)

frhack
  • 4,862
  • 2
  • 28
  • 25
1

For the forall/exists part:

someList.filter(conditionA).forall(conditionB)

would be the same as (though a little bit un-intuitive)

!someList.exists(conditionA && !conditionB)

Similarly, .filter().exists() can be combined into one exists() check?

lznt
  • 2,330
  • 2
  • 22
  • 27
-3

Using for yield can be a work around, for example:

for {
  e <- col;
  if e isNotEmpty
} yield e.get(0)
-5

As a workaround, you can implement other functions with only map and flatMap.

Moreover, this optimisation is useless on small collections…

Yann Moisan
  • 8,161
  • 8
  • 47
  • 91