4

When I submit spark job using spark-submit with master yarn and deploy-mode cluster, it doesn't print/return any applicationId and once job is completed I have to manually check MapReduce jobHistory or spark HistoryServer to get the job details.
My cluster is used by many users and it takes lot of time to spot my job in jobHistory/HistoryServer.

is there any way to configure spark-submit to return the applicationId?

Note: I found many similar questions but their solutions retrieve applicationId within the driver code using sparkcontext.applicationId and in case of master yarn and deploy-mode cluster the driver also run as a part of mapreduce job, any logs or sysout printed to remote host log.

Rahul Sharma
  • 5,614
  • 10
  • 57
  • 91
  • I'm not sure I get your note. The applicationId from the sparkcontext is the way to go – eliasah May 26 '17 at 21:20
  • As my driver is launched on one of cluster node so how to send the applicationId from that node to client? is there any out of the box feature spark provides? – Rahul Sharma May 26 '17 at 21:54
  • 1
    You can save applicationId to a file on hdfs.Many software use this way to keep processing id . – Zhang Tong May 27 '17 at 10:34
  • Thanks. Yeah it make sense to persists applicationId on HDFS and let client read it upon required. Another solution I implemented is notify user applicationId using email. @zhangtong Please post your comment as answer. – Rahul Sharma May 27 '17 at 15:56

1 Answers1

0

Here are the approaches that I used to achieve this:

  1. Save the application Id to HDFS file. (Suggested by @zhangtong in comment).
  2. Send an email alert with applictionId from driver.
Rahul Sharma
  • 5,614
  • 10
  • 57
  • 91