Erro spark-assembly-1.4.1-hadoop2.6.0.jar does not exist

Question

I'm trying to submit a Spark app from local machine Terminal to my Cluster. I'm using --master yarn-cluster. I need to run the driver program on my Cluster too, not on the machine I do submit the application i.e my local machine

I'm using

    bin/spark-submit 
--class com.my.application.XApp 
--master yarn-cluster --executor-memory 100m 
--num-executors 50 hdfs://name.node.server:8020/user/root/x-service-1.0.0-201512141101-assembly.jar 
1000

and getting error

Diagnostics: java.io.FileNotFoundException: File file:/Users/nish1013/Dev/spark-1.4.1-bin-hadoop2.6/lib/spark-assembly-1.4.1-hadoop2.6.0.jar does not exist

I can see in my service list ,

YARN + MapReduce2 2.7.1.2.3 Apache Hadoop NextGen MapReduce (YARN)
Spark 1.4.1.2.3 Apache Spark is a fast and general engine for
large-scale data processing.

already installed.

My spark-env.sh in local machine

export HADOOP_CONF_DIR=/Users/nish1013/Dev/hadoop-2.7.1/etc/hadoop

Has anyone encountered similar before ?

If you're running it on the cluster, then your local settings are not relevant. You should check for the settings and the filesystem of the nodes of the cluster — mgaido, Dec 21 '15 at 12:21
thank you, I'm not sure why it is then complaining about a local file ? — nish1013, Dec 21 '15 at 13:40
Spark needs that jar to run. According to the configuration of your installation, that jar is assumed to be located in the folder you've said.You have two option: you can put the jar locally to all your cluster machines and configuring each of them properly or you can put it into HDFS. — mgaido, Dec 21 '15 at 13:54
I added that jar to HDFS , where should I configure the location for that jar ? — nish1013, Dec 21 '15 at 14:10
not sure what do you mean by configuring in cluster . Currently I copied that jar to my HDFS in the cluster , and now I'm looking for any config param or submit argument to provide this location — nish1013, Dec 21 '15 at 14:18
I mean the `HADOOP_CONF_DIR` variable to the HDFS location where you put the jar.. — mgaido, Dec 21 '15 at 14:52
This question was spawned by http://stackoverflow.com/questions/34391977/spark-submit-does-automatically-upload-the-jar-to-cluster/34516023#34516023 might want to mention that? Very interesting question and great comments in both places! Worth a read, puts the lie to some Spark docs that suggest this all happens "automagically" — JimLohse, Dec 29 '15 at 17:39

score 0 · Answer 1 · answered Jan 25 '18 at 17:56

I think the right command to call is like following:

bin/spark-submit --class com.my.application.XApp --master yarn-cluster --executor-memory 100m --num-executors 50 --conf spark.yarn.jars=hdfs://name.node.server:8020/user/root/x-service-1.0.0-201512141101-assembly.jar 1000

or you can add spark.yarn.jars hdfs://name.node.server:8020/user/root/x-service-1.0.0-201512141101-assembly.jar in your spark.default.conf file

Erro spark-assembly-1.4.1-hadoop2.6.0.jar does not exist

1 Answers1

Linked