FileNotFoundException when submitting to Spark Cluster

Question

I've created a small application using Apache Spark, when I run the application locally everything runs fine. But when I submit it to a 6-node cluster I get a FileNotFoundException, because he can't find the input file.

This is my tiny application.

def main (args: Array[String]) {
  val sparkContext = new SparkContext(new SparkConf())
  val tweets = sparkContext.textFile(args(0))

  tweets.map { line => (line, LanguageDetector.create().detect(line)) }
     .saveAsTextFile("/data/detected")
}

I submit the application with the following command:

/opt/spark-1.0.2-bin-hadoop2/bin/spark-submit --class YarnTest --master spark://luthor-v1:7077 lang_detect.jar twitter_data

After the submit I get the following exception:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 4 times, most recent failure: Exception failure in TID 6 on host luthor-v5: java.io.FileNotFoundException: File file:/opt/bb/twitter_data does not exist

The file is there for sure, the jar and the file are in the same directory and it can resolve the full path.

Thanks in advance

Have you tried typing the file extension too? It could be the cause of the problem. — Mikel Urkia, Sep 22 '14 at 12:45
Try giving full path to the input file (ex: /root/twitter_data). Is it present on all the workers? — Dan Osipov, Sep 22 '14 at 21:21
@DanOsipov No, the file isn't present on all worker nodes. It only resides on the master node. I'm guessing that is my problem. I could store my file on an HDFS cluster as suggested below. — Mathias Lavaert, Sep 23 '14 at 08:09
I am having same problem as OP. I have copied the file to HDFS, but still same error. Any suggestions? — Bhushan, Apr 03 '15 at 19:29

score 0 · Answer 1 · answered Sep 23 '14 at 06:04

0

spark-submit assumes that the jar is residing in the current working directory and the file mentioned is in the hdfs. Copy your file twitter_data from the local file system to hdfs like this:

hadoop fs -copyFromLocal twitter_data /twitter_data

It will copy the file into / directory of hdfs. Now run the command:

spark-submit --class YarnTest --master spark://luthor-v1:7077 lang_detect.jar /twitter_data

answered Sep 23 '14 at 06:04

Anas

177
1
11

I am having same problem as OP. I have copied the file to HDFS, but still same error. Any suggestions? – Bhushan Apr 03 '15 at 19:28
is there a way to make the Spark context do that? I don't want to manually put the file into HDFS. Can't it do it by itself – salvob Mar 14 '17 at 10:56
@salvob see [this](http://stackoverflow.com/questions/27299923/how-to-load-local-file-in-sc-textfile-instead-of-hdfs) question. – Anas Mar 14 '17 at 16:30

score 0 · Answer 2 · answered Sep 28 '16 at 20:23

0

The Hadoop configuration directory "spark-env.sh" is probably not ok. Please check it. It should be: "your_hadoop_dir /etc/hadoop/ "

answered Sep 28 '16 at 20:23

Carlos AG

1,078
1
12
16

FileNotFoundException when submitting to Spark Cluster

2 Answers2