Spark Container running beyond physical limits

Question

I've been searching a lot for a solution for the following issue. I'm using Scala 2.11.8 and Spark 2.1.0.

Application application_1489191400413_3294 failed 1 times due to AM Container for appattempt_1489191400413_3294_000001 exited with exitCode: -104
For more detailed output, check application tracking page:http://ip-172-31-17-35.us-west-2.compute.internal:8088/cluster/app/application_1489191400413_3294Then, click on links to logs of each attempt.
Diagnostics: Container [pid=23372,containerID=container_1489191400413_3294_01_000001] is running beyond physical memory limits. 
Current usage: 1.4 GB of 1.4 GB physical memory used; 3.5 GB of 6.9 GB virtual memory used. Killing container.

Note that I've allotted a lot more than the 1.4 GB being reported in the error here. Since I see none of my executors failing, my read from this error was this the driver needs more memory. However, my settings don't seem to be propagating through.

I'm setting job parameters to yarn as follows:

val conf = new SparkConf()
  .setAppName(jobName)
  .set("spark.hadoop.mapred.output.committer.class", "com.company.path.DirectOutputCommitter")
additionalSparkConfSettings.foreach { case (key, value) => conf.set(key, value) }

// this is the implicit that we pass around
implicit val sparkSession = SparkSession
  .builder()
  .appName(jobName)
  .config(conf)
  .getOrCreate()

where the memory provisioning parameters in additionalSparkConfSettings were set with the following snippet:

HashMap[String, String](
  "spark.driver.memory" -> "8g",
  "spark.executor.memory" -> "8g",
  "spark.executor.cores" -> "5",
  "spark.driver.cores" -> "2",
  "spark.yarn.maxAppAttempts" -> "1",
  "spark.yarn.driver.memoryOverhead" -> "8192",
  "spark.yarn.executor.memoryOverhead" -> "2048"
)

Are my settings really not propagating? Or am I misinterpreting the logs?

Thanks!

I changed `spark.yarn.driver.memoryOverhead` to 10240 and the job still failed with the same exact error I mentioned above. However when I updated `spark.driver.memory` by a couple GBs, it succeeded. It seems like `memoryOverhead` configs are really not working. — Navneet, Mar 28 '17 at 00:41

score 0 · Answer 1 · answered Aug 31 '17 at 18:37

The overhead memory is required to be setup for both executor and driver and it should be fraction of driver and executor memory.

spark.yarn.executor.memoryOverhead = executorMemory * 0.10, with minimum of 384

The amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).

spark.yarn.driver.memoryOverhead = driverMemory * 0.10, with minimum of 384.

The amount of off-heap memory (in megabytes) to be allocated per driver in cluster mode. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the container size (typically 6-10%).

To learn more about memory optimizations please see Memory Management Overview

Also see following thread on SO Container is running beyond memory limits

Cheers !

score 0 · Accepted Answer · answered Sep 01 '17 at 00:52

The problem in my case was a simple albeit-easy-to-miss one.

Setting driver-level parameters inside code does not work in code. Because by then, it is apparently already too late and the configuration is ignored. I confirmed this with a few tests when I solved it months ago.

Executor parameters however can be set in code. However, keep parameter precedence protocols in mind if you end up setting the same parameter in different places.

Spark Container running beyond physical limits

2 Answers2