16

when running spark streaming app that consumes data from kafka topic with 100 partitions, and the streaming runs with 10 executors, 5 cores and 20GB RAM per executor, the executors crash with the following log:

ERROR ResourceLeakDetector: LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred.

ERROR YarnClusterScheduler: Lost executor 18 on worker23.oct.com: Slave lost

ERROR ApplicationMaster: RECEIVED SIGNAL TERM

this exception appears in spark JIRA:

https://issues.apache.org/jira/browse/SPARK-17380

and someone wrote that after upgrading to spark 2.0.2 the problem was solved. however we use spark 2.1 as part of HDP 2.6. so I guess this bug wasn't solved in spark 2.1.

there's also someone who encountered this bug and wrote about it in spark user list but got no answer:

http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Receiver-Resource-Leak-td27857.html

BTW - the streaming app doesn't call cache() or persist(), so no caching is involved whatsoever.

Did anyone encounter a streaming app that crashed on such bug?

Elad Eldor
  • 803
  • 1
  • 12
  • 22

0 Answers0