We are using Spark + Java in our project, and the Hadoop distribution being used is MapR.
In our Spark jobs we persist data (at disk level).
After the job completes, there is lot of temp data inside the /tmp/ folder. How can we ensure that /tmp/ folder (temp data) gets empty after the job execution completes.
I found a link below: Apache Spark does not delete temporary directories
But not sure how to set the following properties:
spark.worker.cleanup.enabled
spark.worker.cleanup.interval
spark.worker.cleanup.appDataTtl
Also, where to set these properties: 1. In Code Or 2. In spark configuration
We are running the job in cluster mode (with master yarn), using spark-submit command.
Thanks Anuj