Spark History server setup

Question

I am trying to setup Spark History config server in local. I am using using Windows and Pycharm for Pyspark programming. I am able to view Spark Web-UI at localhost:4040. The things I have done are:

spark-defaults.conf: (Where I have added last three lines.)

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
# spark.master                     spark://master:7077
# spark.eventLog.enabled           true
# spark.eventLog.dir               hdfs://namenode:8021/directory
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.jars.packages                com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.1
spark.eventLog.enabled      true
spark.history.fs.logDirectory   file:///D:///tmp///spark-events

Run the history server

 C:\Users\hp\spark>bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 20/08/09 08:58:04 INFO HistoryServer: Started daemon with process name: 13476@DESKTOP-B9KRC6O
 20/08/09 08:58:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 20/08/09 08:58:23 INFO SecurityManager: Changing view acls to: hp
 20/08/09 08:58:23 INFO SecurityManager: Changing modify acls to: hp
 20/08/09 08:58:23 INFO SecurityManager: Changing view acls groups to:
 20/08/09 08:58:23 INFO SecurityManager: Changing modify acls groups to:
 20/08/09 08:58:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hp); groups with view permissions: Set(); users  with modify permissions: Set(hp); groups with modify permissions: Set()
 20/08/09 08:58:24 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
 20/08/09 08:58:26 INFO Utils: Successfully started service on port 18080.
 20/08/09 08:58:26 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://DESKTOP-B9KRC6O:18080

After I run my Pyspark program successfully, I am unable to see Job details at Spark-History-server web UI. Though the server is started. It looks like below:

The localhost:18080 Web UI

The references I have already used:

Windows: Apache Spark History Server Config

How to run Spark History Server on Windows

The code I use is as follows:

 from pyspark import SparkContext,SparkConf
 from pyspark.sql import SparkSession
 conf = SparkConf().setAppName("madhu").setMaster("local")
 sc = SparkContext(conf=conf)
 spark = SparkSession(sc).builder.getOrCreate()


 def readtable(dbname,table):
     dbname = dbname
     table=table
     hostname = "localhost"
     jdbcPort = 3306
     username = "root"
     password = "madhu"
     jdbc_url = "jdbc:mysql://{0}:{1}/{2}?user={3}&password={4}".format(hostname,jdbcPort, dbname,username,password)
     dataframe = spark.read.format('jdbc').options(driver = 'com.mysql.jdbc.Driver',url=jdbc_url, dbtable=table).load()
     return dataframe

 t1 = readtable("db","table1")
 t2 = readtable("db2","table2")

 print(t2.show())
 spark.stop()

Please help me on how can I achieve the same.I will provide any data that will be required.

I have also tried with directory paths as:

 spark.eventLog.enabled      true
 spark.history.fs.logDirectory   file:///D:/tmp/spark-events

If you look at files in `D:///tmp///spark-events`, what do you see? Does your code also stop the SparkSession? — OneCricketeer, Aug 09 '20 at 05:56
The folder D:///tmp///spark-events is empty. No file is present in it. I have stopped sparksession. — Madhuchaitanya Joshi, Aug 09 '20 at 06:38
Well, if that is empty, then the error you see isn't lying... You'll need to figure out where your logs are going — OneCricketeer, Aug 10 '20 at 04:00

score 0 · Answer 1 · answered Nov 24 '20 at 22:58

You must provide the correct master URL in the application and run the application with spark-submit.

You can find it in the Spark UI at localhost:4040 In the following example, the master URL is spark://XXXX:7077.

Your application should be:

conf = SparkConf().setAppName("madhu").setMaster("spark://XXXX:7077")

Spark History server setup

1 Answers1

Linked