0

I think the title pretty much sums up what I am trying to do here. I have the following piece of code

implicit val sc: SparkContext = spark.sparkContext
val result = RDD[RDD[GenericRecord]] = sc.parallelize(dates).map { date => 
    val foo: RDD[GenericRecord] = readSomething(...)
    foo
}

I want to convert result to an RDD of GenericRecord but foo is not Traversable so that I can use flatMap. Any ideas here?

Niko
  • 616
  • 4
  • 20

1 Answers1

0

As discussed here, Spark does not support nested RDDs. So even if I was able to flat map this somehow it would fail on runtime. What I ended up doing is the following:

implicit val sc: SparkContext = spark.sparkContext
val partials = IndexedSeq[RDD[GenericRecord]] = dates.map { date => 
    val foo: RDD[GenericRecord] = readSomething(...)
    foo
}

val result:RDD[GenericRecord] = sc.union(partials)
Niko
  • 616
  • 4
  • 20