2

I am reading my data as whole text files. My object is of type Article which I defined. Here's the reading and processing of the data:

JavaPairRDD<String, String> filesRDD = context.wholeTextFiles(inputDataPath);
JavaRDD<Article> processingFiles = filesRDD.map(fileNameContent -> {
    String content = fileNameContent._2();
    Article a = new Article(content);
    return a;
}

Now, once every file has been processed separately, I would like to write the result on HDFS as a separate file to, not with saveAsTextFile. I know that probably I have to do it with foreach, so:

processingFiles.foreach(a -> {
     // Here is a pseudo code of how I want to do this
     String fileName = here_is_full_file_name_to_write_to_hdfs;
     writeToDisk(fileName, a); // This could be a simple text file
});

Any ideas how to do this in Java?

Belphegor
  • 4,456
  • 11
  • 34
  • 59
  • Have you tried using the standard java IO to open the file, write the contents to that file, and then close the file within the foreach? (This may give you a serialization error, but if it works you're done.) – Matthew Gray Apr 12 '16 at 16:43
  • u can look at using HDFS file system APIs . Refer [stackoverflow answer here](http://stackoverflow.com/questions/16000840/write-a-file-in-hdfs-with-java) – urug Apr 12 '16 at 16:54
  • Have you looked at this http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job? – evgenii Apr 13 '16 at 20:57

0 Answers0