A SPARK CLUSTER ISSUE

Question

I know that when the spark cluster in the production environment is running a job, it is in the stand-alone mode.

While I was running a job, a few points of worker's memory overflow caused the worker node process to die.

I would like to ask how to analyze the error shown in the image below:

score 0 · Answer 1 · answered Jul 31 '18 at 04:00

EDIT: This is a relatively common problem please also view this if the below doesn't help you Spark java.lang.OutOfMemoryError: Java heap space.

Without seeing your code here is the process you should follow:

(1) If the issue is caused primarily from the Java allocation running out of space within the container allocation I would advise messing with your memory overhead settings (below). The current value are a little high and will cause the excess spin-up of vcores. Add the two below settings to your spark-submit and re-run.

--conf "spark.yarn.executor.memoryOverhead=4000m"
--conf "spark.yarn.driver.memoryOverhead=2000m"

(2) Adjust Executor and Driver Memory Levels. Start low and climb. Add these values to the spark-submit statement.

--driver-memory 10g
--executor-memory 5g

(3) Adjust Number of Executor Values in the spark submit.

--num-executors ##

(4) Look at the Yarn stages of the job and figure where inefficiencies in the code is present and where persistence's can be added and replaced. I would advise to heavily look into spark-tuning.

A SPARK CLUSTER ISSUE

1 Answers1