1

My program contains quite a few jar files, which are copied to work directory for each executor of the app. These directories are located at $SPARK_HOME/work. These directories hold program's libraries and logs (stdout and stderr). Note that I am not talking about Spark's tmp directories here, as they are something else.

As these directories can get quite big, I want to remove these directories as soon as my program is done. One way obviously is to write some script yourself to do that, but is there a way I can command Spark to do it for me, that is delete them as soon as a program is finished?

MetallicPriest
  • 29,191
  • 52
  • 200
  • 356
  • Possible duplicate of [Apache Spark does not delete temporary directories](https://stackoverflow.com/questions/30093676/apache-spark-does-not-delete-temporary-directories) – ernest_k Apr 17 '19 at 11:25
  • tmp directories are something else. They are put somewhere in the /tmp folder by default. These work directories contain stdout, stderr and the libraries. – MetallicPriest Apr 17 '19 at 11:26
  • There is one article on this, https://interset.zendesk.com/hc/en-us/articles/218569327-Spark-work-directory-using-lots-of-disk-space. One can even change the location of this directory by using SPARK_WORKER_DIR. – MetallicPriest Apr 17 '19 at 11:32
  • What launch mode are you using? Standalone, YARN or something else? – Serge Harnyk Apr 17 '19 at 15:47
  • I am using Standalone mode. – MetallicPriest Apr 17 '19 at 16:17

0 Answers0