0

Code:

val badData:RDD[ListBuffer[String]] = rdd.filter(line => line(1).equals("XX") || line(5).equals("XX"))
badData.coalesce(1).saveAsTextFile(propForFile.getString("badDataFilePath"))

First time program runs fine. On running again it throws exception for file AlreadyExists. I want to resolve this using FileUtils java functionalities and save rdd as a text file.

Shaido
  • 27,497
  • 23
  • 70
  • 73

3 Answers3

1

Before you write the file to a specified path, delete the already existing path.

val fs = FileSystem.get(sc.hadoopConfiguration)
fs.delete(new Path(bad/data/file/path), true)

Then perform your usual write process. Hope this should resolve the problem.

Dasarathy D R
  • 335
  • 2
  • 7
  • 20
1
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

val fs = spark.SparkContext.hadoopCofigurations
if (fs.exists(new Path(path/to/the/files)))
    fs.delete(new Path(path/to/the/files), true)

Pass the file name as String to the method, if directory or files present it will delete. Use this piece of code before writing it to the output path.

Pyd
  • 6,017
  • 18
  • 52
  • 109
  • While this might answer the authors question, it lacks some explaining words and links to documentation. Raw code snippets are not very helpful without some phrases around it. You may also find [how to write a good answer](https://stackoverflow.com/help/how-to-answer) very helpful. Please edit your answer. – hellow Sep 04 '18 at 09:13
0

Why not use DataFrames? Get the RDD[ListBuffer[String] into an RDD[Row] - something like -

import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType}
val badData:RDD[ListBuffer[String]] = rdd.map(line => 
  Row(line(0), line(1)... line(n))
 .filter(row => filter stuff)
badData.toDF().write.mode(SaveMode.Overwrite)
uh_big_mike_boi
  • 3,350
  • 4
  • 33
  • 64
  • 1
    Other than that, this guy might be using the FileUtils library in a way you like https://stackoverflow.com/questions/24371259/how-to-make-saveastextfile-not-split-output-into-multiple-file – uh_big_mike_boi Aug 28 '18 at 04:50