12

How can I monitor the progress of a job through the Spark WEB UI? Running Spark locally, I can access Spark UI through the port 4040, using http://localhost:4040.

ZygD
  • 22,092
  • 39
  • 79
  • 102
Salem Othman
  • 121
  • 1
  • 4

2 Answers2

5

Following this colab notebook you can do the following.

First, configure the Spark UI and start a Spark session:

import findspark
findspark.init()
from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf


conf = SparkConf().set('spark.ui.port', '4050')
sc = SparkContext(conf=conf)
spark = SparkSession.builder.master('local[*]').getOrCreate()

In the next cell run:

!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip
get_ipython().system_raw('./ngrok http 4050 &')

which will install ngrok and create a URL through which you can access the Spark UI (wait 10sec for it to start).

Now, to access the URL, call:

!curl -s http://localhost:4040/api/tunnels

which prints out a JSON that looks something like this (truncated):

{"tunnels":[{"name":"command_line","uri":"/api/tunnels/command_line","public_url":"https://1b881e94406c.ngrok.io","proto":"https", ... }

-- you're looking for the this "public_url" above, that's your Spark UI's URL.

Or, run this:

!curl -s http://localhost:4040/api/tunnels | python3 -c "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

I've tested it and it works for me.

ponadto
  • 702
  • 7
  • 17
  • 1
    I just re-visited the colab notebook and it works for me. The last step failed initially, but that's probably because there should be a sleep that gives the UI some time to setup properly. You might simply re-run it and it should be OK. And if it doesn't -- please let me know (what kinds of errors you're getting / what doesn't seem to work). – ponadto Feb 02 '21 at 15:09
  • wait for 10-15 seconds before running !curl -s http://localhost:4040/api/tunnels – Ezio Sep 05 '21 at 21:02
  • @ponadto when i click on the generated link, it takes me to a sign up page for ngrok to get an authorization token, how would I get to the interactive spark UI? – Hira Tanveer Jan 01 '22 at 18:48
  • @HiraTanveer it's a paid service a with a limit on the number of URLs generated per hour (or day, not sure). The only suggestion I might have is: try to be frugal with the number of Spark UIs you generate. – ponadto Jan 02 '22 at 19:10
0

The approach can be quite short:

!pip install -q pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.config('spark.ui.port', '4050').getOrCreate()

!wget -qnc https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip -n -q ngrok-stable-linux-amd64.zip
get_ipython().system_raw('./ngrok http 4050 &')
!sleep 5
!curl -s http://localhost:4040/api/tunnels | grep -Po 'public_url":"(?=https)\K[^"]*'

Result:

enter image description here

ZygD
  • 22,092
  • 39
  • 79
  • 102