I'm new to spark. And my problem is the following. I have a pairRDD with data already. And now, i need to apply a map transformation to it, so that I get back a new RDD with a new value that depends on some inner transformations inside the map function, as following. (pseudo-code)
JavaPairRDD<Long,Long> originalRDD = .... //the one i load from the dataset
JavaPairRDD<Long,Long> anotherrdd = ......; //the source of tuples
JavaPairRDD<Tuple2<Long, Long>, Long> result = anotherrdd
.mapToPair(tuple-> {
JavaRDD<Long> aux1;
JavaRDD<Long> aux2;
aux1 = originalRDD.filter(T -> T._1.equals(tuple._1)).values().flatMap(f -> f);
aux2 = originalRDD.filter(T -> T._2.equals(tuple._2)).values().flatMap(f -> f);
JavaPairRDD<Long,Long> auxfinal = aux1.intersect(aux2);
//some other code here that process auxfinal and returns a
//new tuple to RESULT(rdd)
});
If I code this way, do the excecutor creates new jobs (for the filters and intersections) and launches them itself?? or will the spark context be aware of this and will create new jobs for that??? I've been reading official documentation and they don't clarify what happens in this case. Thanks in advance!