3

I do two MapReduce job, and I want for the second job to be able to write my result into two different files, in two different directories. I would like something similar to FileInputFormat.addInputPath(.., multiple input path) in a sense, but for the output.

I'm completely new to MapReduce, and I have a specificity to write my code in Hadoop 0.21.0 I use context.write(..) in my Reduce step, but I don't see how to control multiple output paths...

Thanks for your time !

My reduceCode from my first job, to show you I only know how to output (it goes into a /../part* file. But now what I would like is to be able to specify two precises files for different output, depending on the key) :

public static class NormalizeReducer extends Reducer<LongWritable, NetflixRating, LongWritable, NetflixUser> {
    public void reduce(LongWritable key, Iterable<NetflixRating> values, Context context) throws IOException, InterruptedException {
        NetflixUser user = new NetflixUser(key.get());
        for(NetflixRating r : values) {
            user.addRating(new NetflixRating(r));
        }
        user.normalizeRatings();
        user.reduceRatings();
        context.write(key, user);
    }
}

EDIT: so I did the method in the last comment as you mentioned, Amar. I don't know if it's works, I have other problem with my HDFS, but before I forget let's put here my discoveries for the sake of civilization :

http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

  • MultipleOutputs DOES NOT act in place of FormatOutputFormat. You define one output path with FormatOutputFormat, and then you can add many more with multiple MultipleOutputs.
  • addNamedOutput method: String namedOutput is just a word who describe.
  • You define the path actually in the write method, the String baseOutputPath arg.
Hermes
  • 51
  • 1
  • 8
  • checkout `MultipleOutputs` : http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html – Amar Apr 09 '13 at 19:10
  • Looks like it! I don't know how I did not find this... I'll try that thanks ! – Hermes Apr 09 '13 at 19:13
  • A question still: we can only add word to describe the path, say the documentation, why is it so ? Why can't we put a complete path, in order to be able to output in different directories ? – Hermes Apr 09 '13 at 20:02
  • Ya, that's a problem with MultipleOutputs! Sadly I too have no answer for it right now, see my question here: http://stackoverflow.com/questions/15100621/multipletextoutputformat-alternative-in-new-api – Amar Apr 09 '13 at 20:08
  • See the last comment of this answer : http://stackoverflow.com/a/15102476/610305 , see if it helps. – Amar Apr 09 '13 at 20:09
  • Thanks, I will try, and keep you informed.. – Hermes Apr 09 '13 at 20:54

1 Answers1

2

so I did the method in the last comment as you mentioned, Amar. I don't know if it's works, I have other problem with my HDFS, but before I forget let's put here my discoveries for the sake of civilization :

http://archive.cloudera.com/cdh/3/hadoop-0.20.2+228/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

MultipleOutputs DOES NOT act in place of FormatOutputFormat. You define one output path with FormatOutputFormat, and then you can add many more with multiple MultipleOutputs. addNamedOutput method: String namedOutput is just a word who describe. You define the path actually in the write method, the String baseOutputPath arg.

Hermes
  • 51
  • 1
  • 8