97

I really don't seem to be understanding Map and FlatMap. What I am failing to understand is how a for-comprehension is a sequence of nested calls to map and flatMap. The following example is from Functional Programming in Scala

def bothMatch(pat:String,pat2:String,s:String):Option[Boolean] = for {
            f <- mkMatcher(pat)
            g <- mkMatcher(pat2)
 } yield f(s) && g(s)

translates to

def bothMatch(pat:String,pat2:String,s:String):Option[Boolean] = 
         mkMatcher(pat) flatMap (f => 
         mkMatcher(pat2) map (g => f(s) && g(s)))

The mkMatcher method is defined as follows:

  def mkMatcher(pat:String):Option[String => Boolean] = 
             pattern(pat) map (p => (s:String) => p.matcher(s).matches)

And the pattern method is as follows:

import java.util.regex._

def pattern(s:String):Option[Pattern] = 
  try {
        Some(Pattern.compile(s))
   }catch{
       case e: PatternSyntaxException => None
   }

It will be great if someone could shed some light on the rationale behind using map and flatMap here.

sc_ray
  • 7,803
  • 11
  • 63
  • 100

5 Answers5

227

TL;DR go directly to the final example

I'll try and recap.

Definitions

The for comprehension is a syntax shortcut to combine flatMap and map in a way that's easy to read and reason about.

Let's simplify things a bit and assume that every class that provides both aforementioned methods can be called a monad and we'll use the symbol M[A] to mean a monad with an inner type A.

Examples

Some commonly seen monads include:

  • List[String] where
    • M[X] = List[X]
    • A = String
  • Option[Int] where
    • M[X] = Option[X]
    • A = Int
  • Future[String => Boolean] where
    • M[X] = Future[X]
    • A = (String => Boolean)

map and flatMap

Defined in a generic monad M[A]

 /* applies a transformation of the monad "content" mantaining the 
  * monad "external shape"  
  * i.e. a List remains a List and an Option remains an Option 
  * but the inner type changes
  */
  def map(f: A => B): M[B] 

 /* applies a transformation of the monad "content" by composing
  * this monad with an operation resulting in another monad instance 
  * of the same type
  */
  def flatMap(f: A => M[B]): M[B]

e.g.

  val list = List("neo", "smith", "trinity")

  //converts each character of the string to its corresponding code
  val f: String => List[Int] = s => s.map(_.toInt).toList 

  list map f
  >> List(List(110, 101, 111), List(115, 109, 105, 116, 104), List(116, 114, 105, 110, 105, 116, 121))

  list flatMap f
  >> List(110, 101, 111, 115, 109, 105, 116, 104, 116, 114, 105, 110, 105, 116, 121)

for expression

  1. Each line in the expression using the <- symbol is translated to a flatMap call, except for the last line which is translated to a concluding map call, where the "bound symbol" on the left-hand side is passed as the parameter to the argument function (what we previously called f: A => M[B]):

    // The following ...
    for {
      bound <- list
      out <- f(bound)
    } yield out
    
    // ... is translated by the Scala compiler as ...
    list.flatMap { bound =>
      f(bound).map { out =>
        out
      }
    }
    
    // ... which can be simplified as ...
    list.flatMap { bound =>
      f(bound)
    }
    
    // ... which is just another way of writing:
    list flatMap f
    
  2. A for-expression with only one <- is converted to a map call with the expression passed as argument:

    // The following ...
    for {
      bound <- list
    } yield f(bound)
    
    // ... is translated by the Scala compiler as ...
    list.map { bound =>
      f(bound)
    }
    
    // ... which is just another way of writing:
    list map f
    

Now to the point

As you can see, the map operation preserves the "shape" of the original monad, so the same happens for the yield expression: a List remains a List with the content transformed by the operation in the yield.

On the other hand each binding line in the for is just a composition of successive monads, which must be "flattened" to maintain a single "external shape".

Suppose for a moment that each internal binding was translated to a map call, but the right-hand was the same A => M[B] function, you would end up with a M[M[B]] for each line in the comprehension.
The intent of the whole for syntax is to easily "flatten" the concatenation of successive monadic operations (i.e. operations that "lift" a value in a "monadic shape": A => M[B]), with the addition of a final map operation that possibly performs a concluding transformation.

I hope this explains the logic behind the choice of translation, which is applied in a mechanical way, that is: n flatMap nested calls concluded by a single map call.

A contrived illustrative example
Meant to show the expressiveness of the for syntax

case class Customer(value: Int)
case class Consultant(portfolio: List[Customer])
case class Branch(consultants: List[Consultant])
case class Company(branches: List[Branch])

def getCompanyValue(company: Company): Int = {

  val valuesList = for {
    branch     <- company.branches
    consultant <- branch.consultants
    customer   <- consultant.portfolio
  } yield (customer.value)

  valuesList reduce (_ + _)
}

Can you guess the type of valuesList?

As already said, the shape of the monad is maintained through the comprehension, so we start with a List in company.branches, and must end with a List.
The inner type instead changes and is determined by the yield expression: which is customer.value: Int

valueList should be a List[Int]

sadolit
  • 27
  • 1
  • 5
pagoda_5b
  • 7,333
  • 1
  • 27
  • 40
  • 1
    The words "is the same as" belong to the meta-language and should be moved out of the code block. – day Dec 14 '13 at 23:38
  • 3
    Every FP beginner should read this. How can this be achieved? – mert inan Jan 10 '14 at 23:37
  • I don't understand what you mean by "Suppose for a moment that each internal binding was translated to a map call, but the right-hand was the same `A => M[B]` function, you would end up with a `M[M[B]]` for each line in the comprehension." Why would you do that? The nature of `map` is `A => A` so that wouldn't happen, right? – melston Apr 05 '15 at 14:35
  • 1
    @melston Let's make an example with `Lists`. If you `map` twice a function `A => List[B]` (which is one of the essential monadic operation) over some value, you end up with a List[List[B]] (we're taking for granted that the types match). The for comprehension inner loop composes those functions with the corresponding `flatMap` operation, "flattening" the List[List[B]] shape into a simple List[B]... I hope this is clear – pagoda_5b Apr 06 '15 at 22:25
  • @pagoda_5b, Not really. But the problem, I think, is that I didn't ask the question correctly. I will have to consider what I am actually confused about and submit a new question. – melston Apr 08 '15 at 00:45
  • 2
    it has been pure awesomeness reading your answer. I wish you would write a book about scala, do you have a blog or something? – Tomer Ben David Sep 04 '15 at 18:06
  • @TomerBenDavid I don't have a blog still, and I had to turn down some book-writing proposals 'cuz I'm too busy right now. Thank you for your appreciation. I'll let you know if I have news on this front. – pagoda_5b Sep 07 '15 at 09:51
  • If map doesn't change i.e List is a List, Option is an Option, how did we go from a List[Int] to an Int when you did yield customer.value ? That yield is a map right? – cool breeze Dec 17 '15 at 14:50
  • 1
    @coolbreeze It could be that I didn't express it clearly. What I meant is that the `yield` clause is `customer.value`, whose type is `Int`, therefore the whole `for comprehension` evaluates to a `List[Int]`. – pagoda_5b Dec 17 '15 at 20:23
  • This is a great explanation; I learned especially from the concrete examples of monads. One correction: The statement, 'each line in the expression using the <- symbol is translated to a `flatMap` call' should be changed to, 'all lines using the `<-` expression but the last are changed to `flatMap` calls; the last is translated to a `map` call'. On such uses of multiple generators (lines using `<-`) see Wampler and Payne, _Programming Scala_ (2015), 225-6. – Ben Weaver Mar 12 '16 at 19:08
  • @BenWeaver it would be of great help if you contribute the edit yourself so that I can eventually approve it. I would appreciate, thanks – pagoda_5b Mar 13 '16 at 19:23
  • Great answer but I'm still not clear how does a `=` figure in the `for`? And how does multiple `for` chaining work, like the implementation of `isFriends1`, [here](http://eed3si9n.com/herding-cats/stacking-future-and-either.html) – Abhijit Sarkar Dec 01 '17 at 21:00
  • @AbhijitSarkar I made a [gist here](https://gist.github.com/ivanopagano/efffd4d7da0f9966f123524e99c09b04) that I hope has no mistakes. The translation is pretty mechanical. – pagoda_5b Dec 04 '17 at 09:45
  • @AbhijitSarkar I don't get what you mean when you ask about the `=` figuring in the `for`? – pagoda_5b Dec 04 '17 at 09:46
  • What I mean is that a `for` comprehension may contain 2 things, `<-` and `=`. `<-` has different meaning based on the context, it can be `flatMap` or `map`. I'm not clear where all can we use `=` and if that changes the processing. Also, usually `for` doesn't change the shape of the "thing", so if it starts with an `Either`, you end up with an `Either` too. However, by chaining `for` with `yield`, it should be possible to change the type. Again, I'm not clear on that. – Abhijit Sarkar Dec 04 '17 at 10:01
  • @AbhijitSarkar as you said already, the `for` won't ever change the shape of the context... if ever you find a code that would seem to do so, there should be an implicit conversion taking place somewhere. – pagoda_5b Dec 04 '17 at 17:09
  • @AbhijitSarkar as I think of it the `<-` would always mean `flatMap` unless it's immediately before the `yield`, hence the last chained element of the `for`. Instead you can think of the `=` as a simple `val a = blahblah` added to the body of the argument to `flatMap`. IIRC there was a proposal to change the syntax to include the `val` keyword here too, for the sake of consistency. – pagoda_5b Dec 04 '17 at 17:12
  • 1
    @AkmalSalikhov because in this specific example it actually turns out to be exactly a nested loop, but without explicit loop variables. But the same exact syntax will work on other data structures supporting `map` and `flatMap`, with different resulting semantics. – pagoda_5b Dec 31 '18 at 08:51
10

I'm not a scala mega mind so feel free to correct me, but this is how I explain the flatMap/map/for-comprehension saga to myself!

To understand for comprehension and it's translation to scala's map / flatMap we must take small steps and understand the composing parts - map and flatMap. But isn't scala's flatMap just map with flatten you ask thyself! if so why do so many developers find it so hard to get the grasp of it or of for-comprehension / flatMap / map. Well, if you just look at scala's map and flatMap signature you see they return the same return type M[B] and they work on the same input argument A (at least the first part to the function they take) if that's so what makes a difference?

Our plan

  1. Understand scala's map.
  2. Understand scala's flatMap.
  3. Understand scala's for comprehension.`

Scala's map

scala map signature:

map[B](f: (A) => B): M[B]

But there is a big part missing when we look at this signature, and it's - where does this A comes from? our container is of type A so its important to look at this function in the context of the container - M[A]. Our container could be a List of items of type A and our map function takes a function which transform each items of type A to type B, then it returns a container of type B (or M[B])

Let's write map's signature taking into account the container:

M[A]: // We are in M[A] context.
    map[B](f: (A) => B): M[B] // map takes a function which knows to transform A to B and then it bundles them in M[B]

Note an extremely highly highly important fact about map - it bundles automatically in the output container M[B] you have no control over it. Let's us stress it again:

  1. map chooses the output container for us and its going to be the same container as the source we work on so for M[A] container we get the same M container only for B M[B] and nothing else!
  2. map does this containerization for us we just give a mapping from A to B and it would put it in the box of M[B] will put it in the box for us!

You see you did not specify how to containerize the item you just specified how to transform the internal items. And as we have the same container M for both M[A] and M[B] this means M[B] is the same container, meaning if you have List[A] then you are going to have a List[B] and more importantly map is doing it for you!

Now that we have dealt with map let's move on to flatMap.

Scala's flatMap

Let's see its signature:

flatMap[B](f: (A) => M[B]): M[B] // we need to show it how to containerize the A into M[B]

You see the big difference from map to flatMap in flatMap we are providing it with the function that does not just convert from A to B but also containerizes it into M[B].

why do we care who does the containerization?

So why do we so much care of the input function to map/flatMap does the containerization into M[B] or the map itself does the containerization for us?

You see in the context of for comprehension what's happening is multiple transformations on the item provided in the for so we are giving the next worker in our assembly line the ability to determine the packaging. imagine we have an assembly line each worker does something to the product and only the last worker is packaging it in a container! welcome to flatMap this is it's purpose, in map each worker when finished working on the item also packages it so you get containers over containers.

The mighty for comprehension

Now let's looks into your for comprehension taking into account what we said above:

def bothMatch(pat:String,pat2:String,s:String):Option[Boolean] = for {
    f <- mkMatcher(pat)   
    g <- mkMatcher(pat2)
} yield f(s) && g(s)

What have we got here:

  1. mkMatcher returns a container the container contains a function: String => Boolean
  2. The rules are the if we have multiple <- they translate to flatMap except for the last one.
  3. As f <- mkMatcher(pat) is first in sequence (think assembly line) all we want out of it is to take f and pass it to the next worker in the assembly line, we let the next worker in our assembly line (the next function) the ability to determine what would be the packaging back of our item this is why the last function is map.
  4. The last g <- mkMatcher(pat2) will use map this is because its last in assembly line! so it can just do the final operation with map( g => which yes! pulls out g and uses the f which has already been pulled out from the container by the flatMap therefore we end up with first:

    mkMatcher(pat) flatMap (f // pull out f function give item to next assembly line worker (you see it has access to f, and do not package it back i mean let the map determine the packaging let the next assembly line worker determine the container. mkMatcher(pat2) map (g => f(s) ...)) // as this is the last function in the assembly line we are going to use map and pull g out of the container and to the packaging back, its map and this packaging will throttle all the way up and be our package or our container, yah!

Tomer Ben David
  • 8,286
  • 1
  • 43
  • 24
5

The rationale is to chain monadic operations which provides as a benefit, proper "fail fast" error handling.

It is actually pretty simple. The mkMatcher method returns an Option (which is a Monad). The result of mkMatcher, the monadic operation, is either a None or a Some(x).

Applying the map or flatMap function to a None always returns a None - the function passed as a parameter to map and flatMap is not evaluated.

Hence in your example, if mkMatcher(pat) returns a None, the flatMap applied to it will return a None (the second monadic operation mkMatcher(pat2) will not be executed) and the final mapwill again return a None. In other words, if any of the operations in the for comprehension, returns a None, you have a fail fast behavior and the rest of the operations are not executed.

This is the monadic style of error handling. The imperative style uses exceptions, which are basically jumps (to a catch clause)

A final note: the patterns function is a typical way of "translating" an imperative style error handling (try...catch) to a monadic style error handling using Option

Bruno Grieder
  • 28,128
  • 8
  • 69
  • 101
  • 1
    Do you know why `flatMap` (and not `map`) is used to "concatenate" the first and the second invocation of `mkMatcher`, but why `map` (and not `flatMap`) is used "concatenate" the second `mkMatcher` and the `yields` block? – Malte Schwerhoff Jan 30 '13 at 08:45
  • 1
    `flatMap` expects you to pass a function returning the result "wrapped"/lifted in the Monad, while `map` will do the wrapping/lifting itself. During call chaining of operations in the `for comprehension` you need to `flatmap` so that the functions passed as parameter are able to return `None` (you cannot lift the value into a None). The last operation call, the one in the `yield` is expected to run *and* return a value; a `map` to chain that last operation is sufficient and avoids having to lift the result of the function into the monad. – Bruno Grieder Jan 30 '13 at 13:01
1

This can be traslated as:

def bothMatch(pat:String,pat2:String,s:String):Option[Boolean] = for {
    f <- mkMatcher(pat)  // for every element from this [list, array,tuple]
    g <- mkMatcher(pat2) // iterate through every iteration of pat
} yield f(s) && g(s)

Run this for a better view of how its expanded

def match items(pat:List[Int] ,pat2:List[Char]):Unit = for {
        f <- pat
        g <- pat2
} println(f +"->"+g)

bothMatch( (1 to 9).toList, ('a' to 'i').toList)

results are:

1 -> a
1 -> b
1 -> c
...
2 -> a
2 -> b
...

This is similar to flatMap - loop through each element in pat and foreach element map it to each element in pat2

korefn
  • 955
  • 6
  • 17
0

First, mkMatcher returns a function whose signature is String => Boolean, that's a regular java procedure which just run Pattern.compile(string), as shown in the pattern function. Then, look at this line

pattern(pat) map (p => (s:String) => p.matcher(s).matches)

The map function is applied to the result of pattern, which is Option[Pattern], so the p in p => xxx is just the pattern you compiled. So, given a pattern p, a new function is constructed, which takes a String s, and check if s matches the pattern.

(s: String) => p.matcher(s).matches

Note, the p variable is bounded to the compiled pattern. Now, it's clear that how a function with signature String => Boolean is constructed by mkMatcher.

Next, let's checkout the bothMatch function, which is based on mkMatcher. To show how bothMathch works, we first look at this part:

mkMatcher(pat2) map (g => f(s) && g(s))

Since we got a function with signature String => Boolean from mkMatcher, which is g in this context, g(s) is equivalent to Pattern.compile(pat2).macher(s).matches, which returns if the String s matches pattern pat2. So how about f(s), it's same as g(s), the only difference is that, the first call of mkMatcher uses flatMap, instead of map, Why? Because mkMatcher(pat2) map (g => ....) returns Option[Boolean], you will get a nested result Option[Option[Boolean]] if you use map for both call, that's not what you want .

xiaowl
  • 5,177
  • 3
  • 27
  • 28