In MapReduce framework, the Map method (transformation) works on each datapoint (k,v) to produce a new datapoint (k',v'). Is there any mechanism to generate a pair of datapoints (k',v') and (k'',v'')?
I am using Apache Spark. The code snippet here:
JavaRDD<String> myrdd = sc.textfile(...);
JavaRDD<String> newrdd = myrdd.map(
new Function<String, String>() {
public Vector call(String s) {
...
}
}
);
By default, size of myrdd
and newrdd
are same. But my objective is to have two entries in newrdd
for each datapoint of myrdd
. How it is possible?