I have a Spark Streaming job (2.3.1 - standalone cluster):
- ~50K RPS
- 10 executors - (2 core, 7.5Gb RAM, 10Gb disk GCP nodes)
- Data rate ~20Mb/sec and job (while running) run ~0.5s on 1s batches.
The problem is that the temporary files that spark writes to /tmp
are never cleaned out as the JVM on the executor never terminates. Short of some clever
batch job to clean the /tmp
directory I am looking to find the proper way to keep my job from crashing (and restarting) on no space left of device
errors.
I have the following SPARK_WORKER_OPTS
set as follows:
SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1200 -Dspark.worker.cleanup.appDataTtl=345600"
I have experimented with both CMS
and G1GC
collectors - neither seemed to have an impacted other than modulating GC time.
I have been through most of the documented settings, and searched about but have not been able to find any additional directions to try. I have to believe that ppl are running much bigger jobs with a long running, stable stream and that this is a solved problem.
Other config bits:
spark-defaults:
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.broadcast.compress true
spark.streaming.stopGracefullyOnShutdown true
spark.ui.reverseProxy true
spark.cleaner.periodicGC.interval 20
...
spark-submit: (nothing special)
/opt/spark/bin/spark-submit \
--master spark://6.7.8.9:6066 \
--deploy-mode cluster \
--supervise \
/opt/application/switch.jar
As it stands the job runs for ~90 minutes before the drives fill up and we crash. I could spin them up with larger drives, but 90 minutes should allow for the config options I've tried to have a go at cleanup at 20 minute intervals. Larger drives would likely just prolong the issue.