10

I am running a job on AWS-EMR 4.1, Spark 1.5 with the following conf:

spark-submit --deploy-mode cluster --master yarn-cluster --driver-memory 200g --driver-cores 30 --executor-memory 70g --executor-cores 8 --num-executors 90 --conf spark.storage.memoryFraction=0.45 --conf spark.shuffle.memoryFraction=0.75 --conf spark.task.maxFailures=1 --conf spark.network.timeout=1800s

Then I got the error below. Where can I find out what is "Exit status: -100" ? And how I might be able to fix this problem? Thanks!


15/12/05 05:54:24 INFO TaskSetManager: Finished task 176.0 in stage 957.0 (TID 128408) in 130885 ms on ip-10-155-195-239.ec2.internal (106/800)
15/12/05 05:54:24 INFO YarnAllocator: Completed container container_1449241952863_0004_01_000026 (state: COMPLETE, exit status: -100)
15/12/05 05:54:24 INFO YarnAllocator: Container marked as failed: container_1449241952863_0004_01_000026. Exit status: -100. Diagnostics: Container released on a *lost* node
15/12/05 05:54:24 INFO YarnAllocator: Completed container container_1449241952863_0004_01_000055 (state: COMPLETE, exit status: -100)
15/12/05 05:54:24 INFO YarnAllocator: Container marked as failed: container_1449241952863_0004_01_000055. Exit status: -100. Diagnostics: Container released on a *lost* node
15/12/05 05:54:24 ERROR YarnClusterScheduler: Lost executor 24 on ip-10-147-11-212.ec2.internal: Yarn deallocated the executor 24 (container container_1449241952863_0004_01_000026)
15/12/05 05:54:24 INFO TaskSetManager: Re-queueing tasks for 24 from TaskSet 957.0
15/12/05 05:54:24 WARN TaskSetManager: Lost task 382.0 in stage 957.0 (TID 128614, ip-10-147-11-212.ec2.internal): ExecutorLostFailure (executor 24 lost)
15/12/05 05:54:24 ERROR TaskSetManager: Task 382 in stage 957.0 failed 1 times; aborting job
15/12/05 05:54:24 WARN TaskSetManager: Lost task 208.0 in stage 957.0 (TID 128440, ip-10-147-11-212.ec2.internal): ExecutorLostFailure (executor 24 lost)
Edamame
  • 23,718
  • 73
  • 186
  • 320
  • 1
    Just for future reference, check the node manager/instance state logs on the executor node to find out more about why the executor is lost. – annunarcist Dec 04 '16 at 07:32
  • How and where do you find those state logs? – matanster Apr 22 '20 at 05:01
  • answered this question @ https://stackoverflow.com/questions/38155421/spark-on-yarn-mode-end-with-exit-status-100-diagnostics-container-released – data_addict Jul 21 '20 at 11:48
  • 2
    Indeed, voted to close as as duplicate of: [Spark on yarn mode end with "Exit status: -100. Diagnostics: Container released on a \*lost\* node"](https://stackoverflow.com/questions/38155421/spark-on-yarn-mode-end-with-exit-status-100-diagnostics-container-released) – Dennis Jaheruddin Aug 01 '20 at 20:59

0 Answers0