1

I want to deploy Spark Job Server (in a Docker container) on a different host to the Spark Master. However the server_start.sh script seems to assume that it is being run on the same machine as the Spark Master. E.g.:

if [ -z "$SPARK_CONF_DIR" ]; then
  SPARK_CONF_DIR=$SPARK_HOME/conf
fi

# Pull in other env vars in spark config, such as MESOS_NATIVE_LIBRARY
. $SPARK_CONF_DIR/spark-env.sh

Under the Architecture section it says:

The job server is intended to be run as one or more independent processes, separate from the Spark cluster (though it very well may be colocated with say the Master).

Does anyone know how the server_start.sh script can be made to work as-is with a Spark Master hosted on a different machine to Spark Job Server?

snark
  • 2,462
  • 3
  • 32
  • 63
  • See this: http://stackoverflow.com/questions/30022086/is-it-always-the-case-that-driver-must-be-on-a-master-node-yes-no-apache-spa/30023864 – Alister Lee May 27 '15 at 12:08
  • Thanks @Alister Lee. That reinforces the idea that Spark Job Server could _in theory_ be made to run apart from the Spark Master's node but it doesn't tell me whether Job Server's `server_start.sh` script will work in that situation. At the moment it looks like I'll have to install spark on the same machine as Spark Job Server if only so the latter can have Spark's jars on its classpath. – snark May 27 '15 at 12:56
  • Those start scripts are intended to start the master and worker applications on the hardware nodes that should fill those roles. The job server runs spark-submit and it takes the URL of the (running) master. Starting the master and submitting a job are decoupled. – Alister Lee May 27 '15 at 23:49
  • Sorry, I think we're talking cross purposes. I'm assuming the Spark Cluster is already started and I'm not trying to touch that at all. The server_start.sh script I'm referring to is for Spark Job Server, not for Spark itself. It's just that the server_start.sh script seems to require Spark to be installed on the same machine as Job Server, even if that is not the same instance of Spark which you ultimately want to submit jobs to via Job Server's REST API. – snark May 29 '15 at 10:33
  • You linked to it, but I did not read it.. sorry have no knowledge of Job Server. Apologies! – Alister Lee May 31 '15 at 00:46
  • Perhaps raise an issue or question in the wiki for the maintainer in github? – Alister Lee May 31 '15 at 00:46
  • The short answer was that I was able to get Spark Job Server deployed on a different host to the Spark Master and have the former successfully submit jobs to the latter. However I had to install a built copy of Spark in the docker container for Job Server's server_start.sh script to work. I also had to modify the script in other ways, such as to allow the URI of the Spark Master to be passed in and edited into Job Server's local.conf file. I don't know why Spark Job Server doesn't include the relevant Spark libraries in the fat jar it makes when you run `sbt job-server/assembly`. – snark Jun 03 '15 at 09:54
  • For information, i had the same issue when trying to deploy for production, i also encountered hadoop related issues (the configuration for information about the cluster didnt exist on the vanilla jobserver) - i started going through and configuring hadoop on that vanilla box, but it became too much of a cost overhead, and seeing as master is running at low CPU, i eventually deployed on jobserver on the master. – andrew.butkus Feb 01 '16 at 10:57

2 Answers2

1

You can set master URL in your local.conf. See here for a sample https://github.com/spark-jobserver/spark-jobserver/blob/master/job-server/config/local.conf.template#L7

You need to replace "local[4]" with "spark://master-node:7077"

noorul
  • 1,283
  • 1
  • 8
  • 18
0

In addition to noorul's answer, I would like to add that you can also use "yarn-client", or whatever you want. In this case, though, take into account that you need to set HADOOP_CONF_DIR or YARN_CONF_DIR. You can find more information here. You will then have to take care of the user executing the job-server as well, so that she will be able to write to HDFS - if using Yarn, for example.

Markon
  • 4,480
  • 1
  • 27
  • 39