6

I'm trying to use all resources on my EMR cluster.

The cluster itself is 4 m4.4xlarge machines (1 driver and 3 workers) with 16 vCore, 64 GiB memory, EBS Storage:128 GiB

When launching the cluster through the cli I'm presented with following options (all 3 options were executed within the same data pipeline):

Just use "maximizeResourceAllocation" without any other spark-submit parameter

This only gives me 2 executors presented here enter image description here

Don't put anything, leave spark-defaults to do their job

Gives following low-quality executors enter image description here

Use AWS's guide on how to configure cluster in EMR

Following this guide, I deduced following spark-submit parameters:

      "--conf",
      "spark.executor.cores=5",
      "--conf",
      "spark.executor.memory=18g",
      "--conf",
      "spark.executor.memoryOverhead=3g",
      "--conf",
      "spark.executor.instances=9",

      "--conf",
      "spark.driver.cores=5",
      "--conf",
      "spark.driver.memory=18g",

      "--conf",
      "spark.default.parallelism=45",

      "--conf",
      "spark.sql.shuffle.partitions=45",

Aaand still no luck: enter image description here

Now I did look everywhere I could on the internet, but couldn't find any explanation on why EMR doesn't use all the resources provided. Maybe I'm missing something or maybe this is expected behaviour, but when "maximizeAllocation" only spans 2 executors on a cluster with 3 workers, there's something wrong there.

UPDATE:

So today while running a different data pipeline I got this using "maximizeResourceAllocation": enter image description here Which is much much better then the other ones, but still lacks a lot in terms of used memory and executors (although someone from EMR team said that emr merges executors into super-executors to improve performances).

Mironor
  • 1,157
  • 10
  • 25
  • @were you performing same task (same dataset size etc) when you got more exectutors? ( "So today I got this using "maximizeResourceAllocation") – A.B Sep 27 '21 at 23:09
  • Can you confirm how you are providing this configuration and you are using right arguments ( e.g --executor-memory etc or by --conf, because quotation around "--conf" in your question confuses me that if you are doing it with spark-submit or at cluster creation time)? – A.B Sep 27 '21 at 23:17
  • @A.B the ones before "update" were done during the same pipeline, the "update" one was different (I took a habit in looking into "executors" tab on all pipelines to see if it would change, and indeed it would) My main question is why does it ignore everything I explicitly provide as a parameter. As for the other question, I'm using emr cli described here https://docs.aws.amazon.com/cli/latest/reference/emr/create-cluster.html#examples (look for the last mention of "Args"), it needs a JSON with a js list for each argument that normally is separated by a space symbol. – Mironor Sep 28 '21 at 09:23
  • I also tried using `"--executor-cores", "5",` (for all of the parameters above, instead of using `--conf`) and it gives the same result – Mironor Sep 28 '21 at 10:13

2 Answers2

1

Have you tried setting --master yarn parameter and replace parameter spark.executor.memoryOverhead by spark.yarn.executor.memoryOverhead ?

Pierre M
  • 11
  • 1
  • 2
1

I wanted to add my answer, even though I cannot explain all the cases that you are seeing.

But I will try to explain those that I can, as best as I can.

Case 1: Maximum Resource Allocation

This is an Amazon EMR specific configuration and I'm not completely sure what Amazon EMR does under the hood in this configuration.

When you enable maximizeResourceAllocation, EMR calculates the maximum compute and memory resources available for an executor on an instance in the core instance group. It then sets the corresponding spark-defaults settings based on the calculated maximum values

The above text snippet is from here.

It looks like there might be some configuration for the cluster and that EMR calculates resource allocation for that. I'm not completely sure though.

One follow-up that I'd like to know is if you are running other applications on your cluster? That can change the calculations done by EMR to determine the resource allocation.

Case 2: Default Values

I can explain this.

This is using something called DynamicResourceAllocation in Spark. From the spark website:

Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand. This feature is particularly useful if multiple applications share resources in your Spark cluster.

And looking at the link above, it mentions that Amazon EMR uses dynamic resource allocation as the default setting for Spark.

So the executor sizes and counts that you are seeing are due to the workload you are running.

Case 3: Specific Values

To understand the allocation here, you need to understand the memory allocation model of Spark.

Spark executors and drivers run as JVMs. And they divide their memory for the JVM into

  1. On-Heap Space, further divided into
    • Storage Memory - for cache data
    • Execution Memory - temporarily store data for shuffle, join sort
    • User Memory - for RDD conversions and dependency graphs
    • Reserved Memory - for spark internal objects
  2. Off-Heap Space - usually disabled by default

The memory allocation for the on-heap space is as follows:

X = Executor Memory Configured
X' = X - 300MB (for reserved memory - taken to start of with to let spark work)

User Memory = X' * 0.40 = User Memory
Storage + Execution Memory = X' * 0.60

The Storage and execution memory is split 50-50.

For your example

X  = 18 GB ~ 18000 MB
X' = 18000 - 300 MB = 17700 MB = Remaining

User = 17700 * 0.40 = 7080
Storage + Execution = 17700 * 0.60 = 10620 ~ 10 GB
And that is what you're seeing in your Storage Memory column and On-heap storage memory column.

Note2: the boundary between Storage and Execution space is flexible. If required, storage could evict all of execution and vice-versa.

Plugging the numbers in, it seems to line-up with your observations. Which explains why you only get 10G out of the original 18G you allocated.

Hope that adds a bit more clarity to at least two of the cases you are seeing.

m_vemuri
  • 672
  • 3
  • 10