How can I monitor the progress of a job through the Spark WEB UI? Running Spark locally, I can access Spark UI through the port 4040, using http://localhost:4040.
Asked
Active
Viewed 3,479 times
2 Answers
5
Following this colab notebook you can do the following.
First, configure the Spark UI and start a Spark session:
import findspark
findspark.init()
from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
conf = SparkConf().set('spark.ui.port', '4050')
sc = SparkContext(conf=conf)
spark = SparkSession.builder.master('local[*]').getOrCreate()
In the next cell run:
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip
get_ipython().system_raw('./ngrok http 4050 &')
which will install ngrok
and create a URL through which you can access the Spark UI (wait 10sec for it to start).
Now, to access the URL, call:
!curl -s http://localhost:4040/api/tunnels
which prints out a JSON that looks something like this (truncated):
{"tunnels":[{"name":"command_line","uri":"/api/tunnels/command_line","public_url":"https://1b881e94406c.ngrok.io","proto":"https", ... }
-- you're looking for the this "public_url"
above, that's your Spark UI's URL.
Or, run this:
!curl -s http://localhost:4040/api/tunnels | python3 -c "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
I've tested it and it works for me.

ponadto
- 702
- 7
- 17
-
1I just re-visited the colab notebook and it works for me. The last step failed initially, but that's probably because there should be a sleep that gives the UI some time to setup properly. You might simply re-run it and it should be OK. And if it doesn't -- please let me know (what kinds of errors you're getting / what doesn't seem to work). – ponadto Feb 02 '21 at 15:09
-
wait for 10-15 seconds before running !curl -s http://localhost:4040/api/tunnels – Ezio Sep 05 '21 at 21:02
-
@ponadto when i click on the generated link, it takes me to a sign up page for ngrok to get an authorization token, how would I get to the interactive spark UI? – Hira Tanveer Jan 01 '22 at 18:48
-
@HiraTanveer it's a paid service a with a limit on the number of URLs generated per hour (or day, not sure). The only suggestion I might have is: try to be frugal with the number of Spark UIs you generate. – ponadto Jan 02 '22 at 19:10
0
The approach can be quite short:
!pip install -q pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.config('spark.ui.port', '4050').getOrCreate()
!wget -qnc https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip -n -q ngrok-stable-linux-amd64.zip
get_ipython().system_raw('./ngrok http 4050 &')
!sleep 5
!curl -s http://localhost:4040/api/tunnels | grep -Po 'public_url":"(?=https)\K[^"]*'
Result:

ZygD
- 22,092
- 39
- 79
- 102
-
This is basically the same as the other answer, posted a year ago – OneCricketeer Nov 08 '21 at 14:04
-
It is way shorter without unnecessary steps. Moreover, the other answer is not working standalone. – ZygD Nov 08 '21 at 14:13
-
Sure. Thank you for shortening the steps, but still, running everything in the same cell kinda defeats the purpose of exploratory notebooks – OneCricketeer Nov 08 '21 at 14:18