2

I have a pipeline in AWS Data Pipeline that runs a shell script named shell.sh:

$ spark-submit transform_json.py


Running command on cluster...
[54.144.10.162] Running command...
[52.206.87.30] Running command...
[54.144.10.162] Command complete.
[52.206.87.30] Command complete.
run_command finished in 0:00:06.

The AWS Data Pipeline console says the job is "FINISHED", but in the stderr log I see that the job was actually aborted:

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: xxxxx, AWS Error Code: null, AWS Error Message: Not Found...        
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows.
    ...
        20/05/22 11:42:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
        20/05/22 11:42:47 INFO MemoryStore: MemoryStore cleared
        20/05/22 11:42:47 INFO BlockManager: BlockManager stopped
        20/05/22 11:42:47 INFO BlockManagerMaster: BlockManagerMaster stopped
        20/05/22 11:42:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
        20/05/22 11:42:47 INFO SparkContext: Successfully stopped SparkContext
        20/05/22 11:42:47 INFO ShutdownHookManager: Shutdown hook called

I'm somewhat new to data pipeline and Spark; can't wrap my head around what's actually happening behind the scene. How do I get the shell script to catch the SparkException?

gogolaygo
  • 199
  • 1
  • 12
  • Did you check https://stackoverflow.com/questions/36034928/spark-exception-task-failed-while-writing-rows? – Sully May 26 '20 at 22:33
  • @HithamS.AlQadheeb I have, actually. That thread doesn't talk about how to handle errors tho, just why the error happens in the first place. – gogolaygo May 26 '20 at 22:43

1 Answers1

6

try like this below example ...

your shell script can catch error code like this... where non zero exit code is error

$? is the exit status of the most recently executed command; by convention, 0 means success and anything else indicates failure.


spark-submit transform_json.py


 ret_code=$?
   if [ $ret_code -ne 0 ]; then 
      exit $ret_code
   fi

You have to code to return exit code by sys.exit(-1) in error condition. check this for python exception handling...

Check this Exit codes in Python

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • Unfortunately it didn't work, I'm still trying a few things – gogolaygo May 29 '20 at 18:27
  • What was the error... To capture exit code this is the way(even for scala or java also...). one thing you have to do is you have to exit with non zero exit code in your exception module .... if you have error in your python code – Ram Ghadiyaram May 29 '20 at 18:29
  • let me know it its working for you. If so you can close this thread by accepting the answer as owner. – Ram Ghadiyaram May 29 '20 at 21:12
  • 2
    I think I was able to figure it out, I added "set -e" at the start of my script and left the command as is "spark-submit transform_json.py". It's working now. Thanks though! – gogolaygo May 29 '20 at 21:14