I'm trying to process some data and output them to different directories. I followed the answer accepted in this post (using MultipleOutputs): Writing output to different folders hadoop
However, when I create multiple directories, the output files are empty (the directories and the files are still created). If I just remove the slashes (just different files in the same directory), the files contain the expected data.
Any help will be appreciated.
Snapshot of the code:
In the main function:
while ((ll = br.readLine())!= null)
{
for (Type v:values)
MultipleOutputs.addNamedOutput(conf, "./"+ll+"/"+v.toString()+"/"+ll, TextOutputFormat.class, Text.class, NullWritable.class);
}
The Reduce class:
public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, NullWritable> {
private MultipleOutputs mos;
public void configure (JobConf context)
{
mos = new MultipleOutputs(context);
}
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, NullWritable> output, Reporter reporter) throws IOException {
while (values.hasNext())
mos.getCollector(key.toString(),
reporter).collect(values.next(),
NullWritable.get());
}
}
The key passed is generated to be in the same format as the namedoutput.
I only added one line to the MultipleOutputs to allow the '/' and '.':
if ((ch=='/') || (ch =='.')) continue;
in the checkTokenName function.strong text