I already saw this question How to implement custom job listener/tracker in Spark? and checked the source code to find out how to get the number of stages per job but is there any way to track programatically the % of jobs that got completed in a Spark app?
I can probably get the number of finished jobs with the listeners but I'm missing the total number of jobs that will be run.
I want to track progress of the whole app and it creates quite a few jobs but I can't find to find it anywhere.
@Edit: I know there's a REST endpoint for getting all the jobs in an app but:
- I would prefer not to use REST but to get it in the app itself (spark running on AWS EMR/Yarn - getting the address probably is doable but I'd prefer to not do it)
- that REST endpoint seems to be returning only jobs that are running/finished/failed so not total number of jobs.