2

I want to extract distinct values from a reducer's output. For that, I write a code in separate mapper and reducer. That is, I have one mapper reducer for generating a training file and a second mapper-reducer gives the distinct rows from the training file. I want both files for testing purposes. So how to use first reducer's output to another mapper as input?

Community
  • 1
  • 1
cloud_anny
  • 77
  • 1
  • 15
  • 1
    Possible duplicate of [Chaining multiple MapReduce jobs in Hadoop](http://stackoverflow.com/questions/2499585/chaining-multiple-mapreduce-jobs-in-hadoop) – Ravindra babu Feb 18 '16 at 18:02

2 Answers2

3

You can do this easily: just pass the output directory of the 1st job as the input directory to the 2nd job. I call it outputTempDir in this example:

String inputDir = "/input";
String outputTempDir = "/output/Temp"
String outputFinalDir = "/output/Final"

Configuration conf = new Configuration();
Job job1 = Job.getInstance(conf, "JOB_1");
job1.setMapperClass(Mapper1.class);
job1.setReducerClass(Reducer1.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job1, new Path(inputDir));
FileOutputFormat.setOutputPath(job1, new Path(outputTempDir));

boolean success = job1.waitForCompletion(true);
if (success) {
    Job job2 = Job.getInstance(conf, "JOB_2");
    job2.setMapperClass(Mapper2.class);
    job2.setReducerClass(Reducer2.class);
    job2.setInputFormatClass(KeyValueTextInputFormat.class);
    FileInputFormat.addInputPath(job2, new Path(outputTempDir));
    FileOutputFormat.setOutputPath(job2, new Path(outputFinalDir));
    success = job2.waitForCompletion(true);
}

return success;

Make sure that the output format of the 1st job is compatible with the input format of the 2nd job.

Prune
  • 76,765
  • 14
  • 60
  • 81
maxteneff
  • 1,523
  • 12
  • 28
0

Maybe you need a scheduler jobs like Oozie.

Oozie let you workflow of task, and concatenate one output to input for other task.

oozie documentation : https://oozie.apache.org/docs/4.2.0/index.html

Oozie provides ui to program schedule visually.

DanielVL
  • 249
  • 1
  • 5
  • Is there any Map reduce solution?I read about chain Mapreduce?Is it possible by chain map reduce – cloud_anny Feb 18 '16 at 08:07
  • Chain permit aggregate maps to mapreduce job. Its defined by: [MAP+ / REDUCE MAP*]. If you want to use more reduce, oozie is the solution.. – DanielVL Feb 18 '16 at 08:13