18

In Scala collections, if one wants to iterate over a collection (without returning results, i.e. doing a side effect on every element of collection), it can be done either with

final def foreach(f: (A) ⇒ Unit): Unit

or

final def map[B](f: (A) ⇒ B): SomeCollectionClass[B]

With the exception of possible lazy mapping(*), from an end-user perspective, I see zero differences in these invocations:

myCollection.foreach { element =>
  doStuffWithElement(element);
}

myCollection.map { element =>
  doStuffWithElement(element);
}

given that I can just ignore what map outputs. I can't think of any specific reason why two different methods should exist & be used, when map seems to include all the functionality of foreach, and, in fact, I would be pretty much impressed if an intelligent compiler & VM won't optimize out that collection object creation given that it's not assigned to anything, or read, or used anywhere.

So, the question is - am I right - and there are no reasons to call foreach anywhere in one's code?

Notes:

(*) The lazy mapping concept, as throughly illustrated in this question, might change things a bit and justify usage of foreach, but as far as I can see, one specifically needs to stumble upon a LazyMap, normal

(**) If one's not using a collection, but writing one, then one would quickly stumble upon the fact that for comprehension syntax syntax is in fact a syntax sugar that generates "foreach" call, i.e. these two lines generate fully equivalent code:

for (element <- myCollection) { doStuffWithElement(element); }
myCollection.foreach { element => doStuffWithElement(element); }

So if one cares about other people using that collection class with for syntax, one might still want to implement foreach method.

Community
  • 1
  • 1
GreyCat
  • 16,622
  • 18
  • 74
  • 112
  • 13
    It's nice to use `foreach` instead of `map` to differentiate between side-effecting and non-side-effecting functions. I don't care if the compiler optimizes one for the other. The difference makes the purpose of the code more apparent. – Michael Zajac Sep 10 '14 at 00:05
  • 1
    To go one step further then LimbSoup, I prefer to use `for ..` (sans yield) instead of `.foreach` directly as it further (to me) shows the divide between iteration *for* side-effects and a transformation for the sequence. Also, it can be dangerous to assume the sequence being iterated is not-lazy. – user2864740 Sep 10 '14 at 00:09
  • In contrast to you, *I* would be strongly impressed if the current Scala compiler and/or the JVM are able to optimize the `map` to the `foreach` code if the result is not used. This appears to me as an **extremely** non-trivial optimization. – ziggystar Sep 10 '14 at 12:04

3 Answers3

17

I can think of a couple motivations:

  1. When the foreach is the last line of a method that is of type Unit your compiler will not give an warning but will with map (and you need -Ywarn-value-discard on). Sometimes you get warning: a pure expression does nothing in statement position; you may be omitting necessary parentheses using map but wouldn't with foreach.
  2. General readability - a reader can know that your mutating some state without returning something at a glance, but greater cognitive resources would be required to understand the same operation if map was used
  3. Further to 1. you also can have type checking when passing named functions around, then into map and foreach
  4. Using foreach won't build a new list, so will be more efficient (thanks @Vishnu)
samthebest
  • 30,803
  • 25
  • 102
  • 142
  • 1
    correct me if I am wrong, but `Map` returns an immutable object, while `foreach` has a return type of `Unit`. So `foreach` is more memory efficient if your intent is just to read the content. Right? – Vishnu Dec 27 '18 at 06:13
  • Yup, Thanks @Vishnu – samthebest Jan 04 '19 at 12:38
  • well, actually I think map returns the same type of collection as it was applied to. So if you apply it to a mutable collection, you should get back a new mutable collection. – Pietrotull Mar 06 '20 at 09:19
8
scala> (1 to 5).iterator map println
res0: Iterator[Unit] = non-empty iterator

scala> (1 to 5).iterator foreach println
1
2
3
4
5
Chris Martin
  • 30,334
  • 10
  • 78
  • 137
  • That's exactly what I mentioned in (*) - it's lazy mapping, which is the default on all iterators. – GreyCat Sep 10 '14 at 00:37
  • So your assertion in this question is that `foreach` and `map` are the same except when they're not? – Chris Martin Sep 10 '14 at 01:07
  • My assertion is that "they are the same except for clearly outlined cases (*) and (**)", both of which, in my opinion, are fairly rare in average end-user practice. – GreyCat Sep 10 '14 at 11:33
4

I'd be impressed if the builder machinery could be optimized away.

scala> :pa
// Entering paste mode (ctrl-D to finish)

implicit val cbf = new collection.generic.CanBuildFrom[List[Int],Int,List[Int]] {
def apply() = new collection.mutable.Builder[Int, List[Int]] {
val b = new collection.mutable.ListBuffer[Int]
override def +=(i: Int) = { println(s"Adding $i") ; b +=(i) ; this }
override def clear() = () ; override def result() = b.result() }
def apply(from: List[Int]) = apply() }

// Exiting paste mode, now interpreting.

cbf: scala.collection.generic.CanBuildFrom[List[Int],Int,List[Int]] = $anon$2@e3cee7b

scala> List(1,2,3) map (_ + 1)
Adding 2
Adding 3
Adding 4
res1: List[Int] = List(2, 3, 4)

scala> List(1,2,3) foreach (_ + 1)
som-snytt
  • 39,429
  • 2
  • 47
  • 129
  • As far as I can see, `CanBuildFrom`-related implicit stuff would generate here a hierarchy of classes on JVM level, including the one that would include `+=` method that actually does something besides constructing a variable that would be ultimately thrown away (and thus all that code could be safely skipped). Of course, a method that *does* something, especially when that something is a clear `invokevirtual` call, won't be optimized away. – GreyCat Sep 10 '14 at 11:37