4

I get this error message every now and then making the Job very unreliable.

On deeper evaluation, and continuous logging, I see the following error:

2021-09-02 10:38:19,810 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(70)): Unknown error from Python: Error Traceback is not available.

The above error also does not mention where the issue lies. I am running Glue 2.0 with Python 3 with worker type G.1X. The data being pulled is only 100,000 rows, so this should not be a memory issue at all.

The AWS page is very unhelpful with the error.

The AWS Glue job fails with the error "Command failed with exit code 10"
Check the CloudWatch logs for the job to find errors related to executors. This error usually occurs during the shuffle stage of Spark. 

Where is the script failing?

Ankit Goel
  • 360
  • 1
  • 5
  • 18
  • My suggestion would be to enable the spark ui on your job. That way you can visualize better metrics on how your executors are behaving. More on how you can configure this can be found here: https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-jobs.html. Alternatively if you have support, raise a ticket with aws support and supply your job run ID. – Eman Sep 02 '21 at 19:47

2 Answers2

0

This AWS Glue error occurs in AWS Glue versions 3.0 or 4.0. This error occurs when you're using the AWS Glue security configuration, but the S3 bucket policy denies the non-encrypted se:putObject.

To resolve this issue, run job.init() at the beginning of the script to bring the AWS Glue security configuration into effect. If you start the Spark session before job.init(), then the Spark security configuration properties are overridden, and the error occurs.

Abdul Haseeb
  • 442
  • 4
  • 22
0

I had job.init() in my code already.

For me, the exit code 10 error was fixed after destroying the job and redeploying it. It might not work for you, but it's worth a shot!

We use terraform, so this was easy. If you are just using the console, try deploying a new job with the same code and settings to see if it'll work.

It's a pretty ambiguous error code without a clear answer. But it seems to be a glue configuration error preventing the code from running. For me, it wouldn't even send a print("hello world") to cloudwatch logs until I recreated the job.