0

I am using bashshell with spark-sql-2.4.1v. I submit my spark job using spark-submit in a shell script.

Need to capture the status of my job. how can this be achieved ?

Anyhelp/advice please ?

BdEngineer
  • 2,929
  • 4
  • 49
  • 85

2 Answers2

3

Check below code.

process_start_datetime=$(date +%Y%m%d%H%M%S)
log_path="<log_dir>"
log_file="${log_path}/${app_name}_${process_start_datetime}.log"

spark-submit \
    --verbose \
    --deploy-mode cluster \
    --executor-cores "$executor_cores" \
    --num-executors "$num_executors" \
    --driver-memory "$driver_memory" \
    --executor-memory "$executor_memory"  \
    --master yarn \
    --class main.App "$appJar" 2>&1 | tee -a "$log_file"

status=$(grep "final status:" < "$log_file" | cut -d ":" -f2 | tail -1 | awk '$1=$1')

To get Application Id

applicationId=$(grep "tracking URL" < "$log_file" | head -n 1 | cut -d "/" -f5)
Srinivas
  • 8,957
  • 2
  • 12
  • 26
1

spark-submit is a async job, so when we submit the command you can get the application id by Calling SparkContext.applicationId. You can then check the status.

reference-https://issues.apache.org/jira/browse/SPARK-5439

if spark is deployed on Yarn then you can check the status using-

///To get application ID use yarn application -list
yarn application -status application_1459542433815_0002

There is another way they mentioned in this answer

Som
  • 6,193
  • 1
  • 11
  • 22