I faced this exception in spark worker node
Exception in thread "dispatcher-event-loop-14" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap.newNode(HashMap.java:1747)
at java.util.HashMap.putVal(HashMap.java:631)
at java.util.HashMap.put(HashMap.java:612)
at java.util.HashSet.add(HashSet.java:220)
at java.io.ObjectStreamClass.getClassDataLayout0(ObjectStreamClass.java:1317)
at java.io.ObjectStreamClass.getClassDataLayout(ObjectStreamClass.java:1295)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at org.apache.spark.rpc.netty.RequestMessage.serialize(NettyRpcEnv.scala:557)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:520)
at org.apache.spark.deploy.worker.Worker.org$apache$spark$deploy$worker$Worker$$sendToMaster(Worker.scala:638)
at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:524)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:216)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.OutOfMemoryError occurred dispatching signal SIGTERM to handler- the VM may need to be forcibly terminated.
Just before this exception worker was repeatedly launching an executor as executor was exiting :-
EXITING with Code 1 and exitStatus 1
Configs:-
- -Xmx for worker process = 1GB
- Total RAM on worker node = 100GB
- Java 8
- Spark 2.2.1
When this exception occurred , 90% of system memory was free. After this expection the process is still up but this worker is disassociated with the master and is not processing anything.
Now as per https://stackoverflow.com/a/48743757 thread , what I understand is that worker process was facing OutOFMemory issue due to repeated submission of the executor. At this point some process sent SIGTERM to worker jvm and while handling this jvm faced OutOfMemory issue.
- Which process could have sent SIGTERM?
- Since there was enough system memory available why did OS or whichever process send the signal , shouldn't jvm exit by itself in case of OutOFMemory issue ?
- When jvm was handling SIGTERM why did OutOfMemory occur ?
- Why is the process still up?