0

I am running the below code and getting error message as : Cannot use map-side combining with array keys for the code

val lines = sc.textFile("../book.txt")    
val line = lines.Map(x => (x.split(" ")))    
val tense = line.map(x => (x,1)).reduceByKey((x,y) => x + y)    
tense.foreach(println)

But when I use flatMap instead of map in line 2 it works perfectly. Why?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
rishabhrrk
  • 23
  • 4
  • 3
    Because `lines.map(x => (x.split(" ")))` results in a `RDD[Array[String]]` (i.e. each line in the text file becomes an array of split words). Replacing `map` with `flatMap` would flatten the RDD into `RDD[String]` of split words across all lines. – Leo C Aug 19 '18 at 05:18
  • 1
    See also https://stackoverflow.com/questions/32698428/why-spark-doesnt-allow-map-side-combining-with-array-keys - to understand why Spark can't use `reduceByKey` if the key has type `Array`. In this case, if you're trying to count words, what you actually need is `flatMap` (as pointed out in previous comment) which would create an RDD with key of type `String`. – Tzach Zohar Aug 19 '18 at 15:23

0 Answers0