1

With an RDD, I can output rdd.saveAsTextFile('directory') which saves the file in hdfs://directory. Can the text file be save directly to a directory on the local filesystem (i.e. directory)?

cshin9
  • 1,440
  • 5
  • 20
  • 33
  • 2
    Possible duplicate of [Save a spark RDD to the local file system using Java](http://stackoverflow.com/questions/31239161/save-a-spark-rdd-to-the-local-file-system-using-java) – DNA May 18 '16 at 22:09

1 Answers1

1

Of course you can... since the saveAsTextFile('directory') will save as many files as your partitioners, you first neeed to merge the files before you copy to local (unless you wish to copy each file into local). Therefore first call

FileUtil.copyMerge(sourceFileSystem, new Path(sourceFullPath), destFileSystem, new Path(destinationFullPath), true, sparkContext.hadoopConfiguration, null)

and afterwards use

FileSystem fs = FileSystem.get(yourConfiguration)
fs.copyToLocalFile(true, destinationFullPath, localFilePath)
Felix
  • 140
  • 10