I have a large class named "DataClass" including the following members: "time", "value", "type", "name", "family". These are distributed as:
JavaPairRDD<key, DataClass> distributedRDD;
Currently what I do is group all these together in the following way:
JavaPairRDD<key, List<DataClass>> distributedRDD.groupByKey();
I currently only need to use two members of this large "DataClass", namely: "time" and "value". In order to improve performance i wanted to avoid shuffling this large data type, and perhaps try and perform the shuffle only on the desired members.
One of the things that came to mind is somehow use reduceByKey in order to reduce the values from "DataClass" to "SmallDataClass" (including only desired members) and shuffle on the smaller class.
Can anyone please help in performing this task?