I am running Spark jobs on EMR with YARN and don't understand the provisioning and reporting of memory from the UIs. I have a master and one core node instance r4.8xlarge which should have 32 cores and 244 GB of memory. According to this doc, it should have 241 GB allocated to YARN. Looking at the UI, this number is 236 GB probably due to additional overheads. Based on best practices, I have configured job to have below configurations.
--executor-cores 5 --executor-memory 35GB --num-executors 6 --conf spark.dynamicAllocation.enabled=false
Calculation for executor memory (236 GB / 6 executors) * 0.9 = 35 GB
When I submit a spark job and I look at Spark UI or console for executor metrics, the numbers are very different and I am confused as to how these are calculated and provisioned. Instead of 6 executors, there are only 4 which results in the job only using 20 cores instead of the available 30. The amount of memory for each executor is 22.2 GB instead of 35 GB which is only 88 GB out of the total 236 GB available.
I have looked at many resources but they only talk about how to tune spark jobs by setting YARN and Spark config which I have followed yet the results are unexpected.
Can someone help explain?
edit: The only applications installed on the cluster are Spark and Hadoop.