2

I'm using 40 r4.2xlarge slaves and one master with the same type host. r4.2xlarge has 8 cores with 61GB Memory.

Given settings are:

  • spark.executor.instances 280
  • spark.executor.cores 1
  • spark.executor.memory 8G
  • spark.driver.memory 40G
  • spark.yarn.executor.memoryOverhead 10240
  • spark.dynamicAllocation.enabled false

When observing a job running with this cluster in its Ganglia, overall cpu usage is around 30% only. and its resource manager "Aggregated Metrics by Executor" table shows only two executors per slave node.

Why does this cluster run only two executors per slave node even with 280 spark.executor.instances?

---- UPDATE ----

I found the yarn-site.xml under /etc/hadoop/conf.empty

  • yarn.scheduler.maximum-allocation-mb 54272
  • yarn.scheduler.maximum-allocation-vcores 128
  • yarn.nodemanager.resource.cpu-vcores 8
  • yarn.nodemanager.resource.memory-mb 54272
Daebarkee
  • 633
  • 2
  • 6
  • 18

1 Answers1

1

If you are running job on the YARN, the number of executors is not only allocate by this parameter, but a number that depends on the some configuration parameters in the YARN. Possibly parameters are:

yarn.scheduler.maximum-allocation-mb
yarn.scheduler.maximum-allocation-vcores
yarn.nodemanager.resource.cpu-vcores
yarn.nodemanager.resource.memory-mb

Please check that parameters in yarn-site.xml

lvnt
  • 487
  • 3
  • 10
  • Due to yarn.scheduler.maximum-allocation-vcores 128, it seems it could not assign up to 7 executors/node. But still I think it could assign 3 executors/node, that is, 40 * 3 = 120 < 128. Any idea? – Daebarkee Jul 25 '18 at 05:52
  • You must increase memory values. Can you try with higher values? – lvnt Jul 25 '18 at 06:01
  • I'm trying to understand how such situation happened. Based on the configuration, I think that one slave node can use up to 54G memory. Weirdly, executor.memoryOverhead is 10G. but executor.memory is 8G. So, per executor 18G. So, three executors actually need *slightly* more than yarn.nodemanager.resource.memory-mb 54272. That's why only two executors are shown. Is this correct? – Daebarkee Jul 25 '18 at 06:44
  • Yep, you right. And the overhead value is looks too much. You can decrease this value. – lvnt Jul 25 '18 at 06:52