I have a Spark standalone setup (v 1.4.1) with 3 workers.
I have an application that read a stream from a Kafka Topic elaborate data and store it in another Kafka Topic.
Last night the application fell down and all worker was down.
The worker's logs report like the following:
16/02/04 21:02:10 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=52180" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54330" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@driverHost:52180/user/CoarseGrainedScheduler" "--executor-id" "24279" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160201182749-0007" "--worker-url" "akka.tcp://sparkWorker@worker2:57853/user/Worker"
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24279/stdout with daily rolling
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24279/stderr with daily rolling
16/02/04 21:02:10 INFO Worker: Executor app-20160129184621-0001/1430 finished with state EXITED message Command exited with code 1 exitStatus 1
16/02/04 21:02:10 INFO Worker: Asked to launch executor app-20160129184621-0001/1431 for stream-elaboration
16/02/04 21:02:10 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=57297" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54326" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@driverHost:57297/user/CoarseGrainedScheduler" "--executor-id" "1431" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160129184621-0001" "--worker-url" "akka.tcp://sparkWorker@worker2:57853/user/Worker"
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1431/stdout with daily rolling
16/02/04 21:02:10 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1431/stderr with daily rolling
16/02/04 21:02:11 INFO Worker: Executor app-20160201182749-0007/24279 finished with state EXITED message Command exited with code 1 exitStatus 1
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160201182749-0007/24280 for stream-elaboration
16/02/04 21:02:11 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=52180" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54330" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@driverHost:52180/user/CoarseGrainedScheduler" "--executor-id" "24280" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160201182749-0007" "--worker-url" "akka.tcp://sparkWorker@worker2:57853/user/Worker"
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24280/stdout with daily rolling
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160201182749-0007/24280/stderr with daily rolling
16/02/04 21:02:11 INFO Worker: Executor app-20160129184621-0001/1431 finished with state EXITED message Command exited with code 1 exitStatus 1
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160129184621-0001/1432 for stream-elaboration
16/02/04 21:02:11 INFO ExecutorRunner: Launch command: "/opt/jdk1.8.0_45/bin/java" "-cp" "/dati/spark-1.4.1-bin-hadoop2.4/sbin/../conf/:/dati/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar:/dati/spark-1.4.1-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=57297" "-DenabledWorkerLog=false" "-Dcom.sun.management.jmxremote.port=54326" "-Dcom.sun.management.jmxremote.ssl=false" "-Dcom.sun.management.jmxremote.authenticate=false" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@driverHost:57297/user/CoarseGrainedScheduler" "--executor-id" "1432" "--hostname" "worker2" "--cores" "1" "--app-id" "app-20160129184621-0001" "--worker-url" "akka.tcp://sparkWorker@worker2:57853/user/Worker"
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1432/stdout with daily rolling
16/02/04 21:02:11 INFO FileAppender: Rolling executor logs enabled for /dati/spark-1.4.1-bin-hadoop2.4/work/app-20160129184621-0001/1432/stderr with daily rolling
16/02/04 21:02:11 INFO Worker: Executor app-20160201182749-0007/24280 finished with state EXITED message Command exited with code 1 exitStatus 1
16/02/04 21:02:11 INFO Worker: Asked to launch executor app-20160201182749-0007/24281 for stream-elaboration
at the end of the log:
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp291507283-42"
Exception in thread "qtp291507283-37" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "ExecutorRunner for app-20160201182749-0007/29488" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkWorker-scheduler-1"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp291507283-38" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "JMX server connection timeout 81"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "JMX server connection timeout 81"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "sparkWorker-10"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp291507283-40"
Exception in thread "qtp291507283-35" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "qtp291507283-39" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp291507283-41" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp291507283-36" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RMI TCP Connection(idle)"
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "RMI TCP Connection(idle)"
....
Running:
ps aux | grep "worker"
the process is still active, but I can't see it on sparkUI.
Why are worker executor restart so frequently?