1

I am running this command line:

hadoop fs -rm -r /tmp/output

And then a Java8 spark job with this main()

    SparkConf sparkConf = new SparkConf();
    JavaSparkContext sc = new JavaSparkContext(sparkConf);
    JavaRDD<JSONObject> rdd = sc.textFile("/tmp/input")
            .map (s -> new JSONObject(s))
    rdd.saveAsTextFile("/tmp/output");
    sc.stop();

And I get this error:

ERROR ApplicationMaster: User class threw exception: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /tmp/output already exists

Any idea how to fix it ?

Uri Goren
  • 13,386
  • 6
  • 58
  • 110
  • I have used the following command in the SparkConf and it works perfectly well `yourSparkConf.set("spark.hadoop.validateOutputSpecs", "false")` – ypriverol Oct 25 '17 at 09:37

1 Answers1

0

You remove HDFS directory but Spark try to save in local file system.

To save in hdfs try this:

rdd.saveAsTextFile("hdfs://<URL-hdfs>:<PORT-hdfs>/tmp/output");

defaults for localhost is:

rdd.saveAsTextFile("hdfs://localhost:9000/tmp/output");

Other solution is remove /tmp/output from your local file system

Best regards

avr
  • 4,835
  • 1
  • 19
  • 30
DanielVL
  • 249
  • 1
  • 5