Spark executors crash due to netty memory leak

Asked Oct 12 '17 at 15:49

Active Oct 12 '17 at 15:49

Viewed 1,091 times

when running spark streaming app that consumes data from kafka topic with 100 partitions, and the streaming runs with 10 executors, 5 cores and 20GB RAM per executor, the executors crash with the following log:

ERROR ResourceLeakDetector: LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred.

ERROR YarnClusterScheduler: Lost executor 18 on worker23.oct.com: Slave lost

ERROR ApplicationMaster: RECEIVED SIGNAL TERM

this exception appears in spark JIRA:

https://issues.apache.org/jira/browse/SPARK-17380

and someone wrote that after upgrading to spark 2.0.2 the problem was solved. however we use spark 2.1 as part of HDP 2.6. so I guess this bug wasn't solved in spark 2.1.

there's also someone who encountered this bug and wrote about it in spark user list but got no answer:

http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Receiver-Resource-Leak-td27857.html

BTW - the streaming app doesn't call cache() or persist(), so no caching is involved whatsoever.

Did anyone encounter a streaming app that crashed on such bug?

asked Oct 12 '17 at 15:49

Elad Eldor

2

getting the same with spark batch on 2.2, please let me know if you resolved this. – ilcord Nov 20 '17 at 10:36
I've encountered the same issue – Sina Rezaei Jan 15 '18 at 13:28
1

Any resolution on this problem? – Mathias Andersen Mar 06 '18 at 10:03
I have the same problem, any ideas? – mjbsgll Jan 14 '19 at 12:46

Spark executors crash due to netty memory leak

0 Answers0