0

I wanted to use Spark's History Server to make use of the logging mechanisms of my Web UI, but I find some difficulty in running this code on my Windows machine.

I have done the following:

Set my spark-defaults.conf file to reflect

spark.eventLog.enabled=true
spark.eventLog.dir=file://C:/spark-1.6.2-bin-hadoop2.6/logs
spark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs

My spark-env.sh to reflect:

SPARK_LOG_DIR    "file://C:/spark-1.6.2-bin-hadoop2.6/logs"
SPARK_HISTORY_OPTS   "-Dspark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs"

I am using Git-BASH to run the start-history-server.sh file, like this:

USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh

And, I get this error:

USERA@SYUHUH MINGW64 /c/spark-1.6.2-bin-hadoop2.6/sbin
$ sh start-history-server.sh
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 69: SPARK_LOG_DIR: command not found
C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 70: SPARK_HISTORY_OPTS: command not found
ps: unknown option -- o
Try `ps --help' for more information.
starting org.apache.spark.deploy.history.HistoryServer, logging to C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out
ps: unknown option -- o
Try `ps --help' for more information.
failed to launch org.apache.spark.deploy.history.HistoryServer:
  Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
  ========================================
full log in C:\spark-1.6.2-bin-hadoop2.6/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-SGPF02M9ZB.out

The full log from the output can be found below:

Spark Command: C:\Program Files (x86)\Java\jdk1.8.0_91\bin\java -cp C:\spark-1.6.2-bin-hadoop2.6/conf\;C:\spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-api-jdo-3.2.6.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-core-3.2.10.jar;C:\spark-1.6.2-bin-hadoop2.6\lib\datanucleus-rdbms-3.2.9.jar -Xms1g -Xmx1g org.apache.spark.deploy.history.HistoryServer
========================================

I am running a sparkR script where I initialize my spark context and then call init().

Please advise whether I should be running the history server before I run my spark script?

Pointers & tips to proceed(with respect to logging) would be greatly appreciated.

turnip424
  • 322
  • 6
  • 16

2 Answers2

4

On Windows you'll need to run the .cmd files of Spark not .sh. According to what I saw, there is no .cmd script for Spark history server. So basically it needs to be run manually.

I have followed the history server Linux script and in order to run it manually on Windows you'll need to take the following steps:

  • All history server configurations should be set at the spark-defaults.conf file (remove .template suffix) as described below
  • You should go to spark config directory and add the spark.history.* configurations to %SPARK_HOME%/conf/spark-defaults.conf. As follows:

    spark.eventLog.enabled true spark.history.fs.logDirectory file:///c:/logs/dir/path

  • After configuration is finished run the following command from %SPARK_HOME%

    bin\spark-class.cmd org.apache.spark.deploy.history.HistoryServer

  • The output should be something like that:

    16/07/22 18:51:23 INFO Utils: Successfully started service on port 18080. 16/07/22 18:51:23 INFO HistoryServer: Started HistoryServer at http://10.0.240.108:18080 16/07/22 18:52:09 INFO ShutdownHookManager: Shutdown hook called

Hope that it helps! :-)

Eyal.Dahari
  • 760
  • 6
  • 13
  • I followed your advice & changed the lines to reflect "export.." & now I get this error: C:\spark-1.6.2-bin-hadoop2.6/conf/spark-env.sh: line 69: export: `file://C:/spark-1.6.2-bin-hadoop2.6/logs': not a valid identifier & spark-env.sh: line 70: export: `- spark.history.fs.logDirectory=file://C:/spark-1.6.2-bin-hadoop2.6/logs': not a valid identifier. " – turnip424 Jul 18 '16 at 01:15
  • I'm not sure what I am missing here. I have a "logs" directory created & I run the history server command before my application starts.. – turnip424 Jul 18 '16 at 01:18
  • Now I got it that you are running on Windows. spark-env.sh is a Linux script. I have amended my answer accordingly. Please also note that the file path format was missing a '/'. Should be three '/'. I have changed it in my answer. I would also try to put it in a different directory just for the sake of the test and make sure I have all the relevant permissions. – Eyal.Dahari Jul 18 '16 at 05:39
  • Thanks for writing back. As advised, I made the changes only to my spark-defaults.conf file. It only contains the lines: spark.eventLog.enabled true spark.eventLog.dir file:///C:/spark-1.6.2-bin-hadoop2.6/logs spark.history.fs.logDirectory file:///C:/spark-1.6.2-bin-hadoop2.6/logs & when I use gitbash to run my "sh start-history-server.sh" command, I get the error as " ps: unknown option -- o. – turnip424 Jul 20 '16 at 03:22
0

in case any one gets the floowing exception:

17/05/12 20:27:50 ERROR FsHistoryProvider: Exception encountered when attempting
 to load application log file:/C:/Spark/Logs/spark--org.apache.spark.deploy.hist
ory.HistoryServer-1-Arsalan-PC.out
java.lang.IllegalArgumentException: Codec [out] is not available. Consider setti
ng spark.io.compression.codec=snappy
        at org.apache.spark.io.CompressionCodec$$anonfun$createCodec$1.apply(Com

Just go to SparkHome/config/spark-defaults.conf and set spark.eventLog.compress false