0

According to this page and this page, I should be able to launch the Spark History Server on Windows using the following Windows CMD command, issued at the Conda prompt, in the Conda environment of interest:

%SPARK_HOME%/bin/spark-class.cmd ^
org.apache.spark.deploy.history.HistoryServer

I installed Python 3.9 and pyspark under Anaconda, under environment py39, so %SPARK_HOME% in the above command is the following path:

SPARK_HOME=C:\Users\User.Name\anaconda3\envs\py39\lib\site-packages\pyspark

I found the path by starting the pyspark/Python shell and querying the environment variable. It was not set at the Conda shell prompt. For CMD commands at the Conda prompt, therefore, I manually replace %SPARK_HOME% with the above path.

In case it was needed, I also created the following default configuration (which is needed for pyspark):

# %SPARK_HOME%/conf/spark-defaults.conf
#--------------------------------------
spark.eventLog.enabled true
spark.eventLog.dir C:\\User\\User.Name\\anaconda3\\envs\\py39\\PySparkLogs
spark.history.fs.logDirectory C:\\User\\User.Name\\anaconda3\\envs\\py39\\PySparkLogs

Before attempting to launch the history server, I also set the following environment variables that I normally would for a pyspark -- again, just in case they are needed. At the Conda prompt for environment py39:

> set "PYSPARK_DRIVER_PYTHON=python"
> set "PYSPARK_PYTHON=python"
> set "HADOOP_HOME=c:%HOMEPATH%\AppData\Local\Hadoop\2.7.1"
> %SPARK_HOME%/bin/spark-class.cmd ^
   org.apache.spark.deploy.history.HistoryServer

This generates reams of errors, as listed in the Annex below. The Conda window ends up at the Conda prompt after all these errors. I am new to Python, Spark, Hadoop, and inexperienced in Java, but I think that the key messages from the output are:

23/07/28 16:36:19 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[spark-history-task-0,5,main]
java.lang.UnsatisfiedLinkError: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)'

The last line refers to native I/O, for which I understand there are none for Windows. Many believe that this is innocuous, e.g., as described here and here

To try and eke out some insight into what might be causing this error, I looked into the folder specified for spark.history.fs.logDirectory above. It contains a file local-1690571121168 with date/time stamp "Jul 28, 2023 4:36:04 PM", i.e., 15 seconds prior to the error messages. I don't think the file is related to launching the history server, as no newer file(s) result from further attempts to launch the history server. New files only show up when I run pyspark from the Conda prompt. On that note, it doesn't seem to matter whether I have pyspark running in another Conda prmopt window, I get the same errors from launching the History Server.

Other Q&A's examined

This Q&A differs from my situation in that the person actually got the history server running. For that question this answer refers to a Master URL, which is obtained from the Spark Web UI, but I see no such URL:

enter image description here

This Q&A addresses my problem, but the first answer is one of the two sources cited in the very first sentence of my question. The second answer doesn't appear to be relevant to the error that I see.

This Q&A suggests launching the history server from %SPARK_HOME%/sbin. The scripts therein, however, are for Unix shells. The CMD scripts are in SPARK_HOME/bin, but running spark-class.cmd org.apache.spark.deploy.history.HistoryServer from there makes no difference.

Annex: Reams of errors from launching History Server

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
23/07/28 16:36:13 INFO HistoryServer: Started daemon with process name: 69808@Laptop-Hostname
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/C:/Users/User.Name/anaconda3/envs/py39/Lib/site-packages/pyspark/jars/spark-unsafe_2.12-3.2.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
23/07/28 16:36:18 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/07/28 16:36:18 INFO SecurityManager: Changing view acls to: User.Name
23/07/28 16:36:18 INFO SecurityManager: Changing modify acls to: User.Name
23/07/28 16:36:18 INFO SecurityManager: Changing view acls groups to:
23/07/28 16:36:18 INFO SecurityManager: Changing modify acls groups to:
23/07/28 16:36:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(User.Name); groups with view permissions: Set(); users  with modify permissions: Set(User.Name); groups with modify permissions: Set()
23/07/28 16:36:18 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions:
23/07/28 16:36:19 INFO Utils: Successfully started service 'HistoryServerUI' on port 18080.
23/07/28 16:36:19 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://Laptop-Hostname:18080
23/07/28 16:36:19 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[spark-history-task-0,5,main]
java.lang.UnsatisfiedLinkError: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)'
        at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
        at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:793)
        at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1215)
        at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1420)
        at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:601)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014)
        at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:761)
        at org.apache.spark.deploy.history.FsHistoryProvider.checkForLogs(FsHistoryProvider.scala:482)
        at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$startPolling$3(FsHistoryProvider.scala:299)
        at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1387)
        at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$getRunner$1(FsHistoryProvider.scala:219)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
23/07/28 16:36:24 INFO ShutdownHookManager: Shutdown hook called

(py39) C:\Users\User.Name>
user2153235
  • 388
  • 1
  • 11

0 Answers0