Questions tagged [spark-ui]

the web interface of a running Spark application to monitor and inspect Spark job executions in a web browser

76 questions
12
votes
2 answers

How to open Spark UI when working on Google Colab?

How can I monitor the progress of a job through the Spark WEB UI? Running Spark locally, I can access Spark UI through the port 4040, using http://localhost:4040.
9
votes
1 answer

How to view AWS Glue Spark UI

In my Glue job, I have enabled Spark UI and specified all the necessary details (s3 related etc.) needed for Spark UI to work. How can I view the DAG/Spark UI of my Glue job?
6
votes
0 answers

Spark UI: How to understand the min/med/max in DAG

I would like to fully understand the meaning of the information about min/med/max. for example: scan time total(min, med, max) 34m(3.1s, 10.8s, 15.1s) means of all cores, the min scan time is 3.1s and the max is 15.1, the total time accumulated is…
6
votes
3 answers

SparkUI not showing Tab (Jobs, Stages, Storage, Environment,...) when run in standalone mode

I'm running spark master through the following command: ./sbin/start-master.sh After that I went to http://localhost:8080, and I saw the following page. I was expecting to see the tab with Jobs, Environments, ... like the following Could someone…
5
votes
2 answers

Spark SQL : Why am I seeing 3 jobs instead of one single job in the Spark UI?

As per my understanding, there will be one job for each action in Spark. But often I see there are more than one jobs triggered for a single action. I was trying to test this by doing a simple aggregation on a dataset to get the maximum from each…
Remis Haroon - رامز
  • 3,304
  • 4
  • 34
  • 62
4
votes
1 answer

How to fix SparkUI Executors , java.io.FileNotFoundException

I've deployed Spring boot server with Apache Spark and everything works stably. But http://X.X.X.X:4040/executors/ SparkUI executors endpoint throws java.io.FileNotFoundException and cannot find /opt/x/x!/BOOT-INF/lib/spark-core_2.11-2.2.0.jar. I…
4
votes
1 answer

Apache Spark: Relationship between action and job, Spark UI

To the best of my understanding till date, in spark a job is submitted whenever an action is called on a dataset/dataframe. the job may further be divided into stages and tasks, which I understand how to find out the number of stages and tasks.…
Vipul Rajan
  • 494
  • 1
  • 5
  • 16
3
votes
1 answer

What is spark spill (disk and memory both)?

As per the documentation: Shuffle spill (memory) is the size of the deserialized form of the shuffled data in memory. Shuffle spill (disk) is the size of the serialized form of the data on disk. My understanding of shuffle is this: Every…
3
votes
2 answers

What is shufflequerystage in spark DAG?

What is the shufflequerystage box that I see in the spark DAGs. How is it different from the excahnge box in the spark stages?
figs_and_nuts
  • 4,870
  • 2
  • 31
  • 56
3
votes
1 answer

Multiple jobs from a single action (Read, Transform, Write)

Currently using PySpark on Databricks Interactive Cluster (with Databricks-connect to submit jobs) and Snowflake as Input/Output data. My Spark application is supposed to read data from Snowflake, apply some simple SQL transformations (mainly…
3
votes
0 answers

How to avoid showing some secret values in sparkUI

I am passing some secret keys in spark-submit command. I am using below to redact the key: --conf 'spark.redaction.regex='secret_key' though it is working,the secret_key is visible in sparkUI during job execution.The redaction takes place at the…
3
votes
0 answers

Job and task duration relationship is Spark UI

I am trying to understand the spark UI to monitor the timings but having difficulty in understanding job duration and task duration relationship. For Below jobs it says total run time 13 Min , but when i open the stage (which have 1 stage and 1…
gkarya42
  • 429
  • 6
  • 22
3
votes
0 answers

Any API to get the data on the query DAG from Spark UI SQL tab

The spark UI has an SQL tab. It can show the query detail as a DAG https://www.cloudera.com/documentation/enterprise/5-9-x/topics/operation_spark_applications.html After the application finishes, the DAG also annotates its nodes with statistic…
Joe C
  • 2,757
  • 2
  • 26
  • 46
3
votes
1 answer

No start-history-server.sh when pyspark installed through conda

I have installed pyspark in a miniconda environment on Ubuntu through conda install pyspark. So far everything works fine: I can run jobs through spark-submit and I can inspect running jobs at localhost:4040. But I can't locate…
oulenz
  • 1,199
  • 1
  • 15
  • 24
3
votes
1 answer

Can't access to SparkUI though YARN

I'm building a docker image to run zeppelin or spark-shell in local against a production Hadoop cluster with YARN. edit: the environment was macOS I can execute jobs or a spark-shell well but when I try to access on Tracking URL on YARN meanwhile…
Pau Trepat
  • 697
  • 1
  • 6
  • 24
1
2 3 4 5 6