I am entering the world of Spark. I use Spark v1.5.0 (cdh5.5.1). I run a Spark job that reads .gz files whose size is between 30MB to 150MB, with a compression factor ~20. I process around 300 of these in 50 executors. I use Yarn in yarn-client mode. The job first reads data from the files and transforms them into RDD[List[String]]
(a simple spark map
).
I figured out that my job was failing because "someone" was killing my executors thanks to this SO question, but it was not trivial to find out who as the only error I was getting from the logs (all containers-merged stdout
and stderr
logs that I got using yarn log
command) was:
16/02/12 08:28:00 INFO rdd.NewHadoopRDD: Input split: hdfs://xxx:8020/user/mjost/file001.gz:0+39683663
16/02/12 08:28:38 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
I suspect Yarn kills them because they probably take more memory than reserved. I managed to fix this issue by increasing spark.yarn.executor.memoryOverhead
, but I would like to understand why Yarn kills them to better handle the situation. My question:
- Where could I get more precise information telling why executors were killed?