I found how to write multiple outputs based on key value in Spark, but I need to write the same with append mode.
I am using saveAsHadoopFile()
to write files, but my generateFileNameForKeyValue()
method returns a filename based on a key that is similar for many keys, hence it is overwriting those files.
Can there is way to write in those files in append mode?
one alternative is use groupByKey() before saveAsHaddopFile(). I dont want to use groupByKey() as in involves too much shuffling.
My code snippet:
public class OutputFormat extends MultipleTextOutputFormat<String, CensusData2> {
public OutputFormat(){
}
@Override
protected String generateActualKey(String key, CensusData2 value) {
return null;
}
@Override
protected String generateFileNameForKeyValue(String key, CensusData2 value, String name) {
String fileName=key.replaceAll(" ", "").replaceAll("[^a-zA-Z0-9_-]", "")+"."+name+".out";
return fileName;
}
}