How to balance between number-of-executors and executor-cores in spark

Question

If I want run my application with 100 cores, how can I configure number-of-executors and executor-cores to achieve best performance? Is 100 executors with 1 core better, or 20 executors with 5 cores better?

From this article, I know if there is too much cores running in one executor, it may exert pressure on HDFS IO. However if I use only one core per executor, where will be many executors, and bring about lots network IO because of shuffle/broadcast operations.

I am wondering how to banlance between number-of-executors and executor-cores. My questions are:

Is it possible that I set a constant executor core number? For example, I always take 4 cores from an 8 core machine. If I can't, what other conditions should I take into account?
However, when running Spark on YARN, Spark applications can't always get the amount of resources it requested. Given the previous example, if I choose 100 executors with 1 core, I may not get 100 executors, so maybe my application is 5 times slower. However if I choose 20 executors with 5 cores, I may get exactly 20 executors, so my application may run faster. How can we choose proper number-of-executors and executor-cores then?

Not sure for your second question, however for the first question you can set `spark.executor.cores` parameter. For more info check http://spark.apache.org/docs/latest/configuration.html#execution-behavior and You can also configure `spark.cores.max` to restrict whole spark app to use certain cores only. For more info visit http://spark.apache.org/docs/latest/configuration.html#scheduling. — Saurabh, Jul 10 '20 at 11:21
@Saurabh For the second question, I edited my question. For the first one, I want to know how to choose the number for `spark.cores.max`, or `executor-cores` — calvin, Jul 10 '20 at 11:31

How to balance between number-of-executors and executor-cores in spark

0 Answers0