I am reading my data as whole text files. My object is of type Article
which I defined. Here's the reading and processing of the data:
JavaPairRDD<String, String> filesRDD = context.wholeTextFiles(inputDataPath);
JavaRDD<Article> processingFiles = filesRDD.map(fileNameContent -> {
String content = fileNameContent._2();
Article a = new Article(content);
return a;
}
Now, once every file has been processed separately, I would like to write the result on HDFS as a separate file to, not with saveAsTextFile
. I know that probably I have to do it with foreach
, so:
processingFiles.foreach(a -> {
// Here is a pseudo code of how I want to do this
String fileName = here_is_full_file_name_to_write_to_hdfs;
writeToDisk(fileName, a); // This could be a simple text file
});
Any ideas how to do this in Java?