1

In MapReduce framework, the Map method (transformation) works on each datapoint (k,v) to produce a new datapoint (k',v'). Is there any mechanism to generate a pair of datapoints (k',v') and (k'',v'')?

I am using Apache Spark. The code snippet here:

JavaRDD<String> myrdd = sc.textfile(...);

JavaRDD<String> newrdd = myrdd.map(
                    new Function<String, String>() {

                        public Vector call(String s) {
                            ...
                        }
                    }
                    );

By default, size of myrdd and newrdd are same. But my objective is to have two entries in newrdd for each datapoint of myrdd. How it is possible?

Soumya Kanti
  • 1,429
  • 1
  • 17
  • 28
  • 3
    Checkout `flatMap` http://stackoverflow.com/questions/22350722/can-someone-explain-to-me-the-difference-between-map-and-flatmap-and-what-is-a-g and https://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/api/java/JavaRDDLike.html#flatMap%28org.apache.spark.api.java.function.FlatMapFunction%29 – GPI Sep 30 '15 at 12:13

0 Answers0