0

In Spark, is it possible to register certain code to run within certain parts of the job lifecycle - say when a new executor starts, when an executor shuts down, when a new partition is about to be processed...etc?

Currently there are methods like forEachPartition or forEachRdd but you have to infer their execution context implicitly, and any hook code has to be written inline rather than registered separately.

pramodbiligiri
  • 355
  • 1
  • 10
  • 1
    How about creating a SparkListener? Several similar question on SO, for example [How to programmatically get information about executors in PySpark](https://stackoverflow.com/questions/62526301/how-to-programmatically-get-information-about-executors-in-pyspark) – mazaneicha Dec 16 '22 at 19:22
  • Thanks. SparkListener seems to fit the bill. The docs page for that shows a lot of event hooks: https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/scheduler/SparkListener.html – pramodbiligiri Dec 21 '22 at 05:00
  • 1
    Not "a lot", ALL events arriving into Spark event queue :) – mazaneicha Dec 21 '22 at 12:50

0 Answers0