0

I'm trying to submit a Spark app from local machine Terminal to my Cluster. I'm using --master yarn-cluster. I need to run the driver program on my Cluster too, not on the machine I do submit the application i.e my local machine

I'm using

    bin/spark-submit 
--class com.my.application.XApp 
--master yarn-cluster --executor-memory 100m 
--num-executors 50 hdfs://name.node.server:8020/user/root/x-service-1.0.0-201512141101-assembly.jar 
1000

and getting error

Diagnostics: java.io.FileNotFoundException: File file:/Users/nish1013/Dev/spark-1.4.1-bin-hadoop2.6/lib/spark-assembly-1.4.1-hadoop2.6.0.jar does not exist

I can see in my service list ,

  • YARN + MapReduce2 2.7.1.2.3 Apache Hadoop NextGen MapReduce (YARN)
  • Spark 1.4.1.2.3 Apache Spark is a fast and general engine for
    large-scale data processing.

already installed.

My spark-env.sh in local machine

export HADOOP_CONF_DIR=/Users/nish1013/Dev/hadoop-2.7.1/etc/hadoop

Has anyone encountered similar before ?

nish1013
  • 3,658
  • 8
  • 33
  • 46
  • If you're running it on the cluster, then your local settings are not relevant. You should check for the settings and the filesystem of the nodes of the cluster – mgaido Dec 21 '15 at 12:21
  • thank you, I'm not sure why it is then complaining about a local file ? – nish1013 Dec 21 '15 at 13:40
  • Spark needs that jar to run. According to the configuration of your installation, that jar is assumed to be located in the folder you've said.You have two option: you can put the jar locally to all your cluster machines and configuring each of them properly or you can put it into HDFS. – mgaido Dec 21 '15 at 13:54
  • I added that jar to HDFS , where should I configure the location for that jar ? – nish1013 Dec 21 '15 at 14:10
  • on the worker nodes of your cluster – mgaido Dec 21 '15 at 14:11
  • not sure what do you mean by configuring in cluster . Currently I copied that jar to my HDFS in the cluster , and now I'm looking for any config param or submit argument to provide this location – nish1013 Dec 21 '15 at 14:18
  • I mean the `HADOOP_CONF_DIR` variable to the HDFS location where you put the jar.. – mgaido Dec 21 '15 at 14:52
  • This question was spawned by http://stackoverflow.com/questions/34391977/spark-submit-does-automatically-upload-the-jar-to-cluster/34516023#34516023 might want to mention that? Very interesting question and great comments in both places! Worth a read, puts the lie to some Spark docs that suggest this all happens "automagically" – JimLohse Dec 29 '15 at 17:39

1 Answers1

0

I think the right command to call is like following:

bin/spark-submit --class com.my.application.XApp --master yarn-cluster --executor-memory 100m --num-executors 50 --conf spark.yarn.jars=hdfs://name.node.server:8020/user/root/x-service-1.0.0-201512141101-assembly.jar 1000

or you can add spark.yarn.jars hdfs://name.node.server:8020/user/root/x-service-1.0.0-201512141101-assembly.jar in your spark.default.conf file

ascetic652
  • 472
  • 1
  • 5
  • 18