9

In my Glue job, I have enabled Spark UI and specified all the necessary details (s3 related etc.) needed for Spark UI to work.
How can I view the DAG/Spark UI of my Glue job?

סטנלי גרונן
  • 2,917
  • 23
  • 46
  • 68

1 Answers1

7

You need to setup an ec2 instance that can host the history server.

The below documentation has links to CloudFormation templates that you can use. https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-history.html

You can access the history server via the ec2 instance(default on 18080). You need to configure the networks and ports suitably.

EDIT - There is also an option to setup SparkUI locally. This requires downloading the docker image from aws-glue-samples repo amd settin the AWS credential and s3 location there. This server consummes the files that the glue job generates. The files are about 4MB large.

jay.cs
  • 183
  • 1
  • 7
  • 9
    Thanks a lot for reply jay.cs. I think if AWS can provide viewing Spark UI directly from Glue console, it would be much beneficial. Glue developers are just provided developer access and never allowed to launch CF stack. AWS could have done better here. Accepting and closing my question. Thanks – Ankur Shrivastava Dec 10 '19 at 23:52
  • Yeah, the documentation is misleading. It talks about setting up the UI, but all it is instructing you to do is enable log streaming from the dev endpoint spark instance to S3. When you use an endpoint dev notebook, you are given a UI server URL, but at least for me, it's inaccessible. I don't know what if anything the URL should connect to, and the very next page in the documentation tells you to set up the history server and point it at logs. – Chris Ivan May 24 '21 at 09:58