2

I understand Below are daemons required for spark cluster

  1. Master
  2. Workers (Slave)
  3. Driver(Launched when Application is submmited)
  4. Executors(Launched when Application is submmited)

I have some very basic questions on Spark when its being set up on yarn cluster

  1. Are there any master daemon or worker daemons started separately for spark ? I understand Resource manager and nodemanager of yarn cluster itself will act as master and workers for spark jobs. From this article http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/, it looks like there are are no master/slave daemons separately for spark on yarn.
  2. If answer to above question is NO. When we are trying to set up spark on existing yarn, do we need to start any Persistent daemon at all before submitting the spark application ?
  3. Any of the start-stop scripts inside spark-1.5.0-bin-hadoop2.4\sbin directory will be useful in this case at all ?
  4. Spark WEB UI is not available once driver is finished its execution . Am I correct ?
KBR
  • 464
  • 1
  • 7
  • 24

1 Answers1

6

Here are answers to your Questions: -

  1. In yarn mode, you do not need Master or Worker or Executors. You just need to submit your application to Yarn and rest Yarn will manage by itself. see Deployment Section of how to leverage Yarn as Cluster Manager.
  2. If your Yarn cluster is up and running and ready to serve, then you don't need any other daemons.
  3. Depends upon what you want to do but scripts like SPARK_HOME/sbin/spark-config.sh or SPARK_HOME/sbin/start-history-server.sh can be used.
  4. Spark Web UI is available only in standalone mode. In yarn Driver UI is available while your job is being executed or you need to Switch on the History server for analyzing the jobs after they are finished.
Ashish Chopra
  • 1,413
  • 9
  • 23
Sumit
  • 1,400
  • 7
  • 9
  • Thanks Sumit . Actually I was going through this link to set up spark on yarn. http://backtobazics.com/big-data/6-steps-to-setup-apache-spark-1-0-1-multi-node-cluster-on-centos/ if you see step 3 in this page , its trying to start some daemons from sbin folder after basic configuration set up. – KBR Feb 01 '16 at 03:39
  • in the blog author is setting up and running the sample Spark Job in Standalone mode. Yarn is just used for HDFS. This [Link](http://stackoverflow.com/questions/20793694/spark-yarn-client-mode) will provide you more clarity for various modes of Spark Deployment. – Sumit Feb 01 '16 at 08:07