Probably a basic question, I'm pretty new to Spark / Scala.
So I have a variable of type Map[String, RDD[Int]]
. I can't iterate through this variable with for
and do anything with the RDD within the loop, it throws an error when I try to invoke any kind of action / transformation inside.
I thought that the variable wasn't a RDD so iterating over Map
with a simple for loop wouldn't be count as a transformation, so I'm pretty confused. Here is what the codes look like:
def trendingSets(pairRDD: RDD[(String, Int)]): Map[String, RDD[Int]] = {
pairRDD
.groupByKey()
.mapValues(v => { this.sc.parallelize(v.toList) })
.take(20)
.toMap
}
def main(args: Array[String]) {
val sets = this.trendingSets(pairRDD)
// inside this loop, no transformations or actions work.
for((tag, rdd) <- sets) {
// For instance this fails:
// val x = rdd.collect()
}
}
The error is
Exception in thread "main" org.apache.spark.SparkException: This RDD lacks a SparkContext. It could happen in the following cases:
(1) RDD transformations and actions are NOT invoked by the driver, but inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
(2) When a Spark Streaming job recovers from checkpoint, this exception will be hit if a reference to an RDD not defined by the streaming job is used in DStream operations. For more information, See SPARK-13758.
Any help would be appreciated. Thanks!