I am quite new on Spark streaming, and I am stuck saving my output.
My question is, how can I save the output of my JavaPairDStream in a text file, which is updated for each file only with the elements inside the DStream?
For example, with the wordCount example,
JavaPairDStream<String, Integer> wordCounts = words.mapToPair(
new PairFunction<String, String, Integer>() {
@Override
public Tuple2<String, Integer> call(String s) {
return new Tuple2<>(s, 1);
}
}).reduceByKey(new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer i1, Integer i2) {
return i1 + i2;
}
});
I would get the following output using wordCounts.print()
,
(Hello,1)
(World,1)
I would like to write the last lines into a text file, which is refreshed each batch with the contents of wordCounts
.
I've tried the following approach,
mappedRDD.dstream().saveAsTextFiles("output","txt");
This is generating a bunch of directories with several senseless files each batch time.
Another approach would be,
mappedRDD.foreachRDD(new Function2<JavaPairDStream<String, Integer>, Time, Void>() {
public Void Call(JavaPairDStream<String, Integer> rdd, Time time)
{
//Something over rdd to save its content on a file???
return null;
}
});
I would appreciate some help.
Thank you