I want to extract distinct values from a reducer's output. For that, I write a code in separate mapper and reducer. That is, I have one mapper reducer for generating a training file and a second mapper-reducer gives the distinct rows from the training file. I want both files for testing purposes. So how to use first reducer's output to another mapper as input?
Asked
Active
Viewed 6,466 times
2
-
1Possible duplicate of [Chaining multiple MapReduce jobs in Hadoop](http://stackoverflow.com/questions/2499585/chaining-multiple-mapreduce-jobs-in-hadoop) – Ravindra babu Feb 18 '16 at 18:02
2 Answers
3
You can do this easily: just pass the output directory of the 1st job as the input directory to the 2nd job. I call it outputTempDir in this example:
String inputDir = "/input";
String outputTempDir = "/output/Temp"
String outputFinalDir = "/output/Final"
Configuration conf = new Configuration();
Job job1 = Job.getInstance(conf, "JOB_1");
job1.setMapperClass(Mapper1.class);
job1.setReducerClass(Reducer1.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job1, new Path(inputDir));
FileOutputFormat.setOutputPath(job1, new Path(outputTempDir));
boolean success = job1.waitForCompletion(true);
if (success) {
Job job2 = Job.getInstance(conf, "JOB_2");
job2.setMapperClass(Mapper2.class);
job2.setReducerClass(Reducer2.class);
job2.setInputFormatClass(KeyValueTextInputFormat.class);
FileInputFormat.addInputPath(job2, new Path(outputTempDir));
FileOutputFormat.setOutputPath(job2, new Path(outputFinalDir));
success = job2.waitForCompletion(true);
}
return success;
Make sure that the output format of the 1st job is compatible with the input format of the 2nd job.
-
Thanks @maxteneff .Your solution is very easy to understand.That solved my problem – cloud_anny Feb 19 '16 at 05:34
0
Maybe you need a scheduler jobs like Oozie.
Oozie let you workflow of task, and concatenate one output to input for other task.
oozie documentation : https://oozie.apache.org/docs/4.2.0/index.html
Oozie provides ui to program schedule visually.

DanielVL
- 249
- 1
- 5
-
Is there any Map reduce solution?I read about chain Mapreduce?Is it possible by chain map reduce – cloud_anny Feb 18 '16 at 08:07
-
Chain permit aggregate maps to mapreduce job. Its defined by: [MAP+ / REDUCE MAP*]. If you want to use more reduce, oozie is the solution.. – DanielVL Feb 18 '16 at 08:13