I am trying to go through the java word count example. As I understand spark RDDs are a special type of collections, and flat map basically converts a nested collection such Stream> => Stream then Why does the spark Java API in the line below need to return an iterator for each line? And how is it used in the RDD?
Shouldn't the function just end at Arrays.asList(line.toString().split(" ")) ?
JavaRDD words =
lines.flatMap(line -> Arrays.asList(line.toString().split(" ")).iterator());