1

I'm creating spark context and session in my PySpark code like this,

conf = SparkConf().set("spark.cleaner.referenceTracking.cleanCheckpoints", "true")
sc = SparkContext.getOrCreate(conf=conf)

spark = SparkSession(sc)
spark.sparkContext.setCheckpointDir("../../checkpoints")

In my code that follows I'm using checkpoint() over some dataframes. It works as expected.

But I want to remove the checkpoints after the code is run to completion.

Is there a spark configuration that I can use? cleanCheckpoints is not doing that.

How can I delete those checkpoint files when the code is completed? What is the best approach?

Sreeram TP
  • 11,346
  • 7
  • 54
  • 108

1 Answers1

0

Write any cleanup logic inside onApplicationStart or onApplicationEnd method of SparkListener & take look at SparkListener abstract class for other useful methods.

Note: Below scala code shows how to register SparkLister & access methods.


spark.sparkContext.addSparkListener(new SparkListener() {

  override def onApplicationStart(applicationStart: SparkListenerApplicationStart) {
    println("Spark ApplicationStart: " + applicationStart.appName);
  }
  override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd) {
    println("Spark ApplicationEnd: " + applicationEnd.time);
  }
});

Srinivas
  • 8,957
  • 2
  • 12
  • 26