Launch mapreduce job on hadoop 2.2 (Yarn) from java application

Question

I'm trying to call a mapreduce job from a java application. In former hadoop versions (1.x) I created a Configuration object and a Job object, set mapred.job.tracker and fs.default.name in the Configuration and ran the Job.

Now, in hadoop 2.x the job tracker does not exist anymore neither exists there any documentation on how to programatically run MR jobs. Any ideas?

What I'm looking for is an explanation as given here: call mapreduce from a java program

Good point, I'll update the answer. – Thomas Jungblut Aug 14 '14 at 19:50 — Thomas Jungblut, Aug 14 '14 at 19:50

score 3 · Accepted Answer · answered Aug 14 '14 at 19:54

You'll need three things:

// this should be like defined in your yarn-site.xml
conf.set("yarn.resourcemanager.address", "yarn-manager.com:50001"); 

// framework is now "yarn", should be defined like this in mapred-site.xm
conf.set("mapreduce.framework.name", "yarn");

// like defined in hdfs-site.xml
conf.set("fs.default.name", "hdfs://namenode.com:9000");

Here is a more detailed explanation in the Hadoop 2.2.0 documentation.

score 0 · Answer 2 · answered Aug 16 '14 at 07:54

You need to write Driver class extending org.apache.hadoop.conf.Configuration and implementing org.apache.hadoop.util.Tool.

Here is sample implementation of Driver class. Please note that you need to have hdfs-site.xml and other configuration files in classpath.

@Override
public int run(String[] args) throws Exception {

  Configuration conf = super.getConf();
  Job job = new Job(conf);
  .....
}


public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    conf.addResource("core-site.xml");
    conf.addResource("hdfs-site.xml");
    conf.addResource("hive-site.xml");
    int res = ToolRunner.run(conf, new EtlTool(), args);
    System.exit(res);
}

Launch mapreduce job on hadoop 2.2 (Yarn) from java application

2 Answers2