1

I am trying to understand how map and flatMap works but got stuck at below piece of code. flatMap() function returns an RDD[Char] but I was expecting the RDD[String] instead. Can someone explain why it yields the RDD[Char] ?

scala> val inputRDD = sc.parallelize(Array(Array("This is Spark"), Array("It is a processing language"),Array("Very fast"),Array("Memory operations")))

scala> val mapRDD = inputRDD.map(x => x(0))
mapRDD: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[28] at map at <console>:26

scala> mapRDD.collect
res27: Array[String] = Array(This is Spark, It is a processing language, Very fast, Memory operations)

scala> val mapRDD = inputRDD.flatMap(x => x(0))
mapRDD: org.apache.spark.rdd.RDD[Char] = MapPartitionsRDD[29] at flatMap at <console>:26

scala> mapRDD.collect
res28: Array[Char] = Array(T, h, i, s,  , i, s,  , S, p, a, r, k, I, t,  , i, s,  , a,  , p, r, o, c, e, s, s, i, n, g,  , l, a, n, g, u, a, g, e, V, e, r, y,  , f, a, s, t, M, e, m, o, r, y,  , o, p, e, r, a, t, i, o, n, s)
Rahul
  • 619
  • 1
  • 5
  • 15
  • Possible duplicate of [Can someone explain to me the difference between map and flatMap and what is a good use case for each?](https://stackoverflow.com/questions/22350722/can-someone-explain-to-me-the-difference-between-map-and-flatmap-and-what-is-a-g) – Jacek Laskowski Jun 25 '17 at 15:43

2 Answers2

2

Take a look at this answer: https://stackoverflow.com/a/22510434/1547734

Basically flatmap transforms an RDD of N elements into (logically) an RDD of N collections and then flattens it into an RDD of all ELEMENTS of the internal collections.

So when you do inputRDD.flatMap(x => x(0)) then you convert each element into a string. A string is a collection of characters so the "flattening" portion would turn the entire RDD into an RDD of the resulting characters.

Since RDD are based on scala collections the following http://www.brunton-spall.co.uk/post/2011/12/02/map-map-and-flatmap-in-scala/ might help understanding it more.

Assaf Mendelson
  • 12,701
  • 5
  • 47
  • 56
2

The goal of flatMap is to convert a single item into multiple items (i.e. a one-to-many relationship). For example, for an RDD[Order], where each order is likely to have multiple items, I can use flatMap to get an RDD[Item] (rather than an RDD[Seq[Item]]).

In your case, a String is effectively a Seq[Char]. It therefore assumes that what you want to do is take that one string and break it up into its constituent characters.

Now, if what you want is to use flatMap to get all of the raw Strings in your RDD, your flatMap function should probably look like this: x => x.

Joe C
  • 15,324
  • 8
  • 38
  • 50