2

I am running a spark job having 4 nodes cluster initially. The cluster is autoscalable so during high load the number of nodes scales up to 15 nodes. But during the startup we have provided the number of partitions on the basis of 4 nodes.Now when my cluster scales up-to 15 nodes, no of partitions are still same(assigned during startup). My question is am I utilizing my cluster fully with same no of partitions even though I am having more no of executors. Or spark internally handles this.

Do I have to change no of partitions dynamically when cluster scales up?? If I have to do this,how can I achieve this in my Spark job??

Any inputs are highly appreciated.

Thanks in advance!!

Ankit
  • 41
  • 6
  • By default spark will create partitions by the number of cores in your cluster. Also [here](https://stackoverflow.com/questions/47399087/spark-get-number-of-cluster-cores-programmatically) is a ways how to determine the size of your cluster and then you create partitions depending on the size of the cluster – Vladislav Varslavans May 18 '18 at 06:58
  • Thanks @Vladislav Varslavans for the usefull link. Since I am running a spark streaming job can I change this number for every batch? Is it a good practice to change number of partitions for every batch depending upon the size of cluster? – Ankit May 28 '18 at 02:00
  • If you need to have certain number of partitions, i guess you just don't have any other choice, but only set number of partitions. But otherwise, spark will handle number of partitions automatically. – Vladislav Varslavans May 28 '18 at 07:26

0 Answers0