I try to run multiple jobs and it works quite well. The problem is when the third job finishes its execution. It returns the expected output but the application doesn't exit. Every time I should use ctrl + c to to exit. This is my main
method:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: app <in> <out>");
System.exit(2);
}
// first job
ControlledJob cjob1 = new ControlledJob(conf);
cjob1.setJobName("First Job");
Job job1 = cjob1.getJob();
job1.setJarByClass(MultipleJobs.class);
job1.setMapperClass(Mapper1.class);
job1.setReducerClass(Reducer1.class);
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(Text.class);
job1.setOutputKeyClass(NullWritable.class);
job1.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job1, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job1, new Path("temp1"));
// second job
ControlledJob cjob2 = new ControlledJob(conf);
cjob2.setJobName("SecondJob");
cjob2.addDependingJob(cjob1);
Job job2 = cjob2.getJob();
job2.setJarByClass(MultipleJobs.class);
job2.setMapperClass(Mapper2.class);
job2.setCombinerClass(Reducer2.class);
job2.setReducerClass(Reducer2.class);
job2.setMapOutputKeyClass(Text.class);
job2.setMapOutputValueClass(IntWritable.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job2, new Path("temp1"));
FileOutputFormat.setOutputPath(job2, new Path("temp2"));
// third job
ControlledJob cjob3 = new ControlledJob(conf);
cjob3.setJobName("Third Job");
cjob3.addDependingJob(cjob2);
Job job3 = cjob3.getJob();
job3.setJarByClass(MultipleJobs.class);
job3.setReducerClass(Reducer3.class);
job3.setMapperClass(Mapper3.class);
job3.setMapOutputKeyClass(NullWritable.class);
job3.setMapOutputValueClass(Text.class);
job3.setOutputKeyClass(Text.class);
job3.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job3, new Path("temp2"));
FileOutputFormat.setOutputPath(job3, new Path(otherArgs[1]));
JobControl control = new JobControl("Controller");
control.addJob(cjob1);
control.addJob(cjob2);
control.addJob(cjob3);
control.run();
}
and the launch command:
hadoop jar MJ.jar MultipleJobs input output
Is it the right way to chain multiple jobs? What should I add/change to avoid 'ctrl + c' at the end of the whole execution?