I've created a small application using Apache Spark, when I run the application locally everything runs fine. But when I submit it to a 6-node cluster I get a FileNotFoundException, because he can't find the input file.
This is my tiny application.
def main (args: Array[String]) {
val sparkContext = new SparkContext(new SparkConf())
val tweets = sparkContext.textFile(args(0))
tweets.map { line => (line, LanguageDetector.create().detect(line)) }
.saveAsTextFile("/data/detected")
}
I submit the application with the following command:
/opt/spark-1.0.2-bin-hadoop2/bin/spark-submit --class YarnTest --master spark://luthor-v1:7077 lang_detect.jar twitter_data
After the submit I get the following exception:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 failed 4 times, most recent failure: Exception failure in TID 6 on host luthor-v5: java.io.FileNotFoundException: File file:/opt/bb/twitter_data does not exist
The file is there for sure, the jar and the file are in the same directory and it can resolve the full path.
Thanks in advance