I'm creating spark context and session in my PySpark code like this,
conf = SparkConf().set("spark.cleaner.referenceTracking.cleanCheckpoints", "true")
sc = SparkContext.getOrCreate(conf=conf)
spark = SparkSession(sc)
spark.sparkContext.setCheckpointDir("../../checkpoints")
In my code that follows I'm using checkpoint()
over some dataframes. It works as expected.
But I want to remove the checkpoints after the code is run to completion.
Is there a spark configuration that I can use? cleanCheckpoints
is not doing that.
How can I delete those checkpoint files when the code is completed? What is the best approach?