I am new in Scala and Spark. This is a simple example of my whole code:
package trouble.something
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object Stack {
def ExFunc2(looku: RDD[(Int, List[(Double, Int)])], ke: Int): Seq[List[(Double, Int)]] = {
val y: Seq[List[(Double, Int)]] = looku.lookup(ke)
val g = y.map{x =>
x
/* some functions here
.
.
*/
}
g
}
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("toy")
val sc = new SparkContext(conf)
val pi: RDD[(Int, List[(Double, Int)])] = sc.parallelize(Seq((1, List((9.0, 3), (7.0, 2))), (2, List((7.0, 1), (1.0, 3))), (3, List((1.0, 2), (9.0, 1)))))
val res = ExFunc2(pi, 1)
println(res)
}
}
I am running a large enough data, and I need faster performance. By looking at Spark's web UI and a software profiler. The most consuming time is lookup()
function:
val y: Seq[List[(Double, Int)]] = looku.lookup(ke)
What is an alternative and way to lookup an element in an RDD rather than lookup()
function?
There is a discussion related to this problem Spark: Fastest way to look up an element in an RDD. However, it does not give me any idea.