I am trying to setup Spark History config server in local. I am using using Windows and Pycharm for Pyspark programming. I am able to view Spark Web-UI at localhost:4040. The things I have done are:
spark-defaults.conf: (Where I have added last three lines.)
# # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Default system properties included when running spark-submit. # This is useful for setting default environmental settings. # Example: # spark.master spark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory # spark.serializer org.apache.spark.serializer.KryoSerializer # spark.driver.memory 5g # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" spark.jars.packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.1 spark.eventLog.enabled true spark.history.fs.logDirectory file:///D:///tmp///spark-events
Run the history server
C:\Users\hp\spark>bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 20/08/09 08:58:04 INFO HistoryServer: Started daemon with process name: 13476@DESKTOP-B9KRC6O 20/08/09 08:58:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/08/09 08:58:23 INFO SecurityManager: Changing view acls to: hp 20/08/09 08:58:23 INFO SecurityManager: Changing modify acls to: hp 20/08/09 08:58:23 INFO SecurityManager: Changing view acls groups to: 20/08/09 08:58:23 INFO SecurityManager: Changing modify acls groups to: 20/08/09 08:58:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hp); groups with view permissions: Set(); users with modify permissions: Set(hp); groups with modify permissions: Set() 20/08/09 08:58:24 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions 20/08/09 08:58:26 INFO Utils: Successfully started service on port 18080. 20/08/09 08:58:26 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://DESKTOP-B9KRC6O:18080
After I run my Pyspark program successfully, I am unable to see Job details at Spark-History-server web UI. Though the server is started. It looks like below:
The references I have already used:
The code I use is as follows:
from pyspark import SparkContext,SparkConf from pyspark.sql import SparkSession conf = SparkConf().setAppName("madhu").setMaster("local") sc = SparkContext(conf=conf) spark = SparkSession(sc).builder.getOrCreate() def readtable(dbname,table): dbname = dbname table=table hostname = "localhost" jdbcPort = 3306 username = "root" password = "madhu" jdbc_url = "jdbc:mysql://{0}:{1}/{2}?user={3}&password={4}".format(hostname,jdbcPort, dbname,username,password) dataframe = spark.read.format('jdbc').options(driver = 'com.mysql.jdbc.Driver',url=jdbc_url, dbtable=table).load() return dataframe t1 = readtable("db","table1") t2 = readtable("db2","table2") print(t2.show()) spark.stop()
Please help me on how can I achieve the same.I will provide any data that will be required.
I have also tried with directory paths as:
spark.eventLog.enabled true spark.history.fs.logDirectory file:///D:/tmp/spark-events