Output to multiple directories in Hadoop job

Question

I'm trying to process some data and output them to different directories. I followed the answer accepted in this post (using MultipleOutputs): Writing output to different folders hadoop

However, when I create multiple directories, the output files are empty (the directories and the files are still created). If I just remove the slashes (just different files in the same directory), the files contain the expected data.

Any help will be appreciated.

Snapshot of the code:

In the main function:

while ((ll = br.readLine())!= null)
{
        for (Type v:values)
            MultipleOutputs.addNamedOutput(conf, "./"+ll+"/"+v.toString()+"/"+ll, TextOutputFormat.class, Text.class, NullWritable.class);
}

The Reduce class:

public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, NullWritable> {
      private MultipleOutputs mos;
      public void configure (JobConf context)
      { 
          mos = new MultipleOutputs(context);
      }
        public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, NullWritable> output, Reporter reporter) throws IOException {
            while (values.hasNext())
            mos.getCollector(key.toString(),
                    reporter).collect(values.next(),
                            NullWritable.get());

        }
      }

The key passed is generated to be in the same format as the namedoutput.

I only added one line to the MultipleOutputs to allow the '/' and '.':

if ((ch=='/') || (ch =='.')) continue;

in the checkTokenName function.strong text

Please show your code, if you can. It's difficult to say something this way. — SSaikia_JtheRocker, Aug 19 '13 at 08:47
Checkout the 'Multiplexing Output' section here : http://www.infoq.com/articles/HadoopOutputFormat — Amar, Aug 19 '13 at 09:16

Output to multiple directories in Hadoop job

0 Answers0