I am using bashshell with spark-sql-2.4.1v. I submit my spark job using spark-submit in a shell script.
Need to capture the status of my job. how can this be achieved ?
Anyhelp/advice please ?
I am using bashshell with spark-sql-2.4.1v. I submit my spark job using spark-submit in a shell script.
Need to capture the status of my job. how can this be achieved ?
Anyhelp/advice please ?
Check below code.
process_start_datetime=$(date +%Y%m%d%H%M%S)
log_path="<log_dir>"
log_file="${log_path}/${app_name}_${process_start_datetime}.log"
spark-submit \
--verbose \
--deploy-mode cluster \
--executor-cores "$executor_cores" \
--num-executors "$num_executors" \
--driver-memory "$driver_memory" \
--executor-memory "$executor_memory" \
--master yarn \
--class main.App "$appJar" 2>&1 | tee -a "$log_file"
status=$(grep "final status:" < "$log_file" | cut -d ":" -f2 | tail -1 | awk '$1=$1')
To get Application Id
applicationId=$(grep "tracking URL" < "$log_file" | head -n 1 | cut -d "/" -f5)
spark-submit
is a async job, so when we submit the command you can get the application id by Calling SparkContext.applicationId
. You can then check the status.
reference-https://issues.apache.org/jira/browse/SPARK-5439
if spark is deployed on Yarn then you can check the status using-
///To get application ID use yarn application -list
yarn application -status application_1459542433815_0002
There is another way they mentioned in this answer