3

I a naive user of spark. I installed spark and using anaconda install pyspark, then run a basic code in the jupyter notebook that is given below. I then open the spark WebUI however I am unable to see any jobs either running or completed. Any comments are appreciated.

from pyspark.sql import SparkSession
spark = SparkSession.builder\
    .master("local")\
    .appName("NQlabtop")\
    .config('spark.ui.port', '4050')\
    .getOrCreate()
sc = spark.sparkContext
input_file=sc.textFile("C:/Users/nqazi/NQ/anscombe.json")
map = input_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1))
counts = map.reduceByKey(lambda a, b: a + b)
print("counts",counts)
sc = spark.sparkContext
data = [1, 2, 3, 4, 5]
distData = sc.parallelize(data)

Please see the image of the Spark WebUI below. I am not sure why I cannot see any of the jobs as I think it should display completed the jobs.

enter image description here

Nhqazi
  • 732
  • 3
  • 12
  • 30

1 Answers1

1

There two types of functions in PySpark (Spark) transformations and actions. Transformations are lazily evaluated and PySpark doesn't perform any jobs until you call an action function like show, count, collect etc.

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
  • I used the collect method still not any job is shown :( – Nhqazi Jan 30 '21 at 14:05
  • I have noticed that jobs are only visible into web UI when I used spark-shell for transformation and action. however, when i used juptyer notebook to perform the same it does not show any job. any comment. – Nhqazi Jan 30 '21 at 16:20
  • I use Jupyter and can see jobs after action functions. Don't forget to refresh your page. – Mykola Zotko Jan 30 '21 at 19:24
  • @Mykola Zotko, in my case all jobs are visible in spark history server, however only jobs ran from Jupyter 'PySpark notebook' show up in Yarn resource manager, jobs ran from Jupyter 'Python notebook' dont show up in Yarn resource manager. Is this expected? – steve Sep 01 '23 at 04:55