My goal is to have the k nearest neighbours of each data point. I would like to avoid the use of a for loop with lookup and use something else simultaneously on each rdd_distance
point, but I can't figure out how to do this.
parsedData = RDD[Object]
//Object have an id and a vector as attribute
//sqdist1 output is a Double
var rdd_distance = parsedData.cartesian(parsedData)
.flatMap { case (x,y) =>
if(x.get_id != y.get_id)
Some((x.get_id,(y.get_id,sqdist1(x.get_vector,y.get_vector))))
else None
}
for(ind1 <- 1 to size) {
val ind2 = ind1.toString
val tab1 = rdd_distance.lookup(ind2)
val rdd_knn0 = sc.parallelize(tab1)
val tab_knn = rdd_knn0.takeOrdered(k)(Ordering[(Double)].on(x=>x._2))
}
Is that possible without use a for loop with lookup ?