AWS EMR Spark app - poor CPU and memory utilization

Question

I'm running 2 copies of my Spark Streaming application (Spark 2.2.1, EMR 5.11, Scala) on AWS EMR (3 nodes * m4.4xlarge cluster - 16vCPU and 64G RAM each node).

In built-in EMR cluster monitoring (Ganglia) I see that CPU utilization of the cluster is less than 30%, memory is used not more than 32GB from ~200GB available, the network is also far from 100%. But the applications can barely finish batch processing within the batch interval.

Here are params I'm using to submit each copy of the app to Master using client mode:

--master yarn
--num-executors 2
--executor-cores 20
--executor-memory 20G
--conf spark.driver.memory=4G
--conf spark.driver.cores=3

How can I reach better resources utilization (app performance)?

score 2 · Answer 1 · answered Apr 25 '20 at 02:32

Using maximizeResourceAllocation from aws docs there all these things are discussed in detail. Read it completely

You can configure your executors to utilize the maximum resources possible on each node in a cluster by using the spark configuration classification to set maximizeResourceAllocation option to true. This EMR-specific option calculates the maximum compute and memory resources available for an executor on an instance in the core instance group. It then sets the corresponding spark-defaults settings based on this information.
[
  {
    "Classification": "spark",
    "Properties": {
      "maximizeResourceAllocation": "true"
    }
  }
]

Further reading

if you are okay please care to accept [the answer as owner](https://meta.stackexchange.com/a/5235/369717) and [vote-up](https://meta.stackexchange.com/a/173400/369717) — Ram Ghadiyaram, Apr 29 '20 at 16:32

score 0 · Answer 2 · answered Apr 19 '18 at 13:46

does your spark executors have multiple vcores?

if yes, then there is a configuration issue on aws emr for allocating the correct amount of cpu

yarn is not honouring yarn.nodemanager.resource.cpu-vcores

see this answer here, turning on a dominant capacity allowed more vcores to operate which i saw increased the cpu usage when monitoring the usage.

As for memory, how big is your dataset and how much memory do you have? - can you see any disk write operations which would account for data moving from memory to disk?

Ok, will play with DominantCapacityCalculator to see how it goes. The dataset is quite small (less than 1MB per batch) but the sequence of transformations and DAG is huge. Multiple window operations, joins, etc. — szu, Apr 19 '18 at 14:15

AWS EMR Spark app - poor CPU and memory utilization

2 Answers2