My Environment:
- Databricks 10.4
- Pyspark
I'm looking into Spark performance and looking specifically into memory/disk spills that are available in Spark UI - Stage section.
What I want to achieve is to get notified if my job had spills.
I have found something below but I'm not sure how it works: https://spark.apache.org/docs/3.1.3/api/java/org/apache/spark/SpillListener.html
I want to find a smart way where major spills are rather than going though all the jobs/stages manually.
ideally, I want to find spills programmatically using pyspark.