I'm trying to manually make partitions, instead of apache spark making its own on availability.
Could someone help me to write the right code for partitions?
i have tried using : JavaRDD distdata = sc.parallelize(data, 2);
Doesnt work, im using a GPU with high RAM, where it is be default taking 122 partitions, which i do not want. I'm looking for something manually with less partitions.
File file = new File("dataset/adult.txt");
JavaSparkContext context = new JavaSparkContext("local","SparkAnonymize");
JavaRDD<String> data = context.textFile(file.getPath(),2);
JavaRDD<String> distdata = sc.parallelize(data, 2);
list = data.top((int)data.count());