1

I'm trying to manually make partitions, instead of apache spark making its own on availability.

Could someone help me to write the right code for partitions?

i have tried using : JavaRDD distdata = sc.parallelize(data, 2);

Doesnt work, im using a GPU with high RAM, where it is be default taking 122 partitions, which i do not want. I'm looking for something manually with less partitions.

 File file = new File("dataset/adult.txt");
    JavaSparkContext context = new JavaSparkContext("local","SparkAnonymize");
    JavaRDD<String> data = context.textFile(file.getPath(),2);
    JavaRDD<String> distdata = sc.parallelize(data, 2);
    list = data.top((int)data.count());
zero323
  • 322,348
  • 103
  • 959
  • 935

1 Answers1

0

use repartition function for partitioning file if you don't want to use default parallelism by spark

Like this JavaRDD<String> data = context.textFile(file.getPath()).repartition(2);

This should work!!

BARATH
  • 364
  • 2
  • 17