0

I try to run multiple jobs and it works quite well. The problem is when the third job finishes its execution. It returns the expected output but the application doesn't exit. Every time I should use ctrl + c to to exit. This is my main method:

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();

    String[] otherArgs = new GenericOptionsParser(conf, args)
            .getRemainingArgs();
    if (otherArgs.length != 2) {
        System.err.println("Usage: app <in> <out>");
        System.exit(2);
    }
    // first job
    ControlledJob cjob1 = new ControlledJob(conf);
    cjob1.setJobName("First Job");
    Job job1 = cjob1.getJob();

    job1.setJarByClass(MultipleJobs.class);
    job1.setMapperClass(Mapper1.class);
    job1.setReducerClass(Reducer1.class);
    job1.setMapOutputKeyClass(Text.class);
    job1.setMapOutputValueClass(Text.class);
    job1.setOutputKeyClass(NullWritable.class);
    job1.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job1, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job1, new Path("temp1"));

    // second job
    ControlledJob cjob2 = new ControlledJob(conf);
    cjob2.setJobName("SecondJob");
    cjob2.addDependingJob(cjob1); 
    Job job2 = cjob2.getJob();

    job2.setJarByClass(MultipleJobs.class);
    job2.setMapperClass(Mapper2.class);
    job2.setCombinerClass(Reducer2.class);
    job2.setReducerClass(Reducer2.class);
    job2.setMapOutputKeyClass(Text.class);
    job2.setMapOutputValueClass(IntWritable.class);
    job2.setOutputKeyClass(Text.class);
    job2.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job2, new Path("temp1"));
    FileOutputFormat.setOutputPath(job2, new Path("temp2"));

    // third job
    ControlledJob cjob3 = new ControlledJob(conf);
    cjob3.setJobName("Third Job");
    cjob3.addDependingJob(cjob2); 
    Job job3 = cjob3.getJob();

    job3.setJarByClass(MultipleJobs.class);
    job3.setReducerClass(Reducer3.class);
    job3.setMapperClass(Mapper3.class);
    job3.setMapOutputKeyClass(NullWritable.class);
    job3.setMapOutputValueClass(Text.class);
    job3.setOutputKeyClass(Text.class);
    job3.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job3, new Path("temp2"));
    FileOutputFormat.setOutputPath(job3, new Path(otherArgs[1]));

    JobControl control = new JobControl("Controller");
    control.addJob(cjob1);
    control.addJob(cjob2);
    control.addJob(cjob3);

    control.run();
}

and the launch command:

 hadoop jar MJ.jar MultipleJobs input output

Is it the right way to chain multiple jobs? What should I add/change to avoid 'ctrl + c' at the end of the whole execution?

artBCode
  • 837
  • 10
  • 13
  • Please checkout the solution provided at http://stackoverflow.com/questions/12374928/hadoop-mapreduce-chain-jobs-jobcontrol-doesnt-stop – Arun A K Apr 22 '14 at 08:40
  • 1
    I am using Hadoop 2.2.0. I tried to use this solution but now it I get "Still running..." and still it doesn't stop. – artBCode Apr 22 '14 at 09:35

1 Answers1

1

Yes you can chain multiple Jobs like this. Check this

Inoder to avoid ctrl+D in your code. You can do

/*Entire configuration for job1*/
job1.waitForCompletion(true);

/*Entire configuration for job2*/
job2.waitForCompletion(true);

/*Entire configuration for job3*/
return job3.waitForCompletion(true) ? 0 : 1;

UPDATE

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();

String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: app <in> <out>");
System.exit(2);
}
// first job
Job job1 = new Job(conf, "job1");
job1.setJarByClass(MultipleJobs.class);
job1.setMapperClass(Mapper1.class);
job1.setReducerClass(Reducer1.class);
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(Text.class);
job1.setOutputKeyClass(NullWritable.class);
job1.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job1, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job1, new Path("temp1"));
job1.waitForCompletion(true);
// second job
Configuration conf2 = getConf();
Job job2 = new Job(conf2, "job2");
job2.setJarByClass(MultipleJobs.class);
job2.setMapperClass(Mapper2.class);
job2.setCombinerClass(Reducer2.class);
job2.setReducerClass(Reducer2.class);
job2.setMapOutputKeyClass(Text.class);
job2.setMapOutputValueClass(IntWritable.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job2, new Path("temp1"));
FileOutputFormat.setOutputPath(job2, new Path("temp2"));
job2.waitForCompletion(true);

// third job
Configuration conf3 = getConf();
Job job3 = new Job(conf3, "job3");
job3.setJarByClass(MultipleJobs.class);
job3.setReducerClass(Reducer3.class);
job3.setMapperClass(Mapper3.class);
job3.setMapOutputKeyClass(NullWritable.class);
job3.setMapOutputValueClass(Text.class);
job3.setOutputKeyClass(Text.class);
job3.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job3, new Path("temp2"));
FileOutputFormat.setOutputPath(job3, new Path(otherArgs[1]));

return job3.waitForCompletion(true) ? 0 : 1;

}
USB
  • 6,019
  • 15
  • 62
  • 93
  • my main returns void. I used `Boolean j3o=job3.waitForCompletion(true);` at the end of the third job configuration and the latest line line in my main `System.exit(j3o ? 0 : 1);`. I've put the other 2 waitForCompletion at the end of the each configuration. As before, it finishes all jobs but now I get `INFO jobcontrol.ControlledJob: First job got an error while submitting java.lang.IllegalStateException: Job in state RUNNING instead of DEFINE` – artBCode Apr 22 '14 at 11:34
  • can u use Job job = new Job(conf) – USB Apr 22 '14 at 11:52
  • For me it is working fine.I have multiple jobs running – USB Apr 22 '14 at 11:52
  • Can You just try and see my update.I am doing in such a manner.And my job succedes – USB Apr 22 '14 at 11:59
  • thank you. now it is OK. do you know the conceptual difference between JobControl and waitForCompletion ? I'm wondering because the other post too uses JobControl. My guess is that waitForCompletion is a blocking call while the other one can be used to launch concurrent jobs too. – artBCode Apr 22 '14 at 13:21
  • I'm afraid the selected answer as correct doesn't fit to what have been asked. Although it does work, it does change the proposed architecture. The solution should address the problem with `JobControl`. I'm facing the same issue, but still didn't figure out how to address. I need JobControl due some jobs running in parallel. – Bruno Ambrozio Mar 03 '20 at 21:22