Running java hadoop job on local/remote cluster

Question

I'm trying to run hadoop job on local/remote cluster. This job in future will be executed from web application. I'm trying to execute this piece of code from eclipse:

public class TestHadoop {

    private final static String host = "localhost";

    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        run();
    }

    static void run() throws IOException, InterruptedException, ClassNotFoundException {
        Configuration conf = new Configuration();

        // run on other machine/cluster
        conf.set("fs.default.name", "hdfs://" + host + ":8020");
        conf.set("mapred.job.tracker", "hdfs://" + host + ":8021");

        Job job = new Job(conf, "Wordcount");
        job.setJarByClass(TestHadoop.class);

        FileInputFormat.addInputPath(job, new Path("/user/hue/jobsub/sample_data/midsummer.txt"));
        FileOutputFormat.setOutputPath(job, new Path("/tmp/hadoop-out2"));

        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        job.waitForCompletion(true);
    }

    static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { 
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
            }
        }
    }

    static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { 

        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
                InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            context.write(key, new IntWritable(sum));
        }
    }
}

However I get the following errors:

2011-09-30 16:32:39,000 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: java.lang.ClassNotFoundException: org.mmm.hadoop.TestHadoop$Map
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.ClassNotFoundException: org.mmm.hadoop.TestHadoop$Map
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:994)
... 8 more

16:33:01.209 [LeaseChecker] DEBUG org.apache.hadoop.hdfs.DFSClient - LeaseChecker is interrupted.
java.lang.InterruptedException: sleep interrupted
    at java.lang.Thread.sleep(Native Method) [na:1.7.0]
    at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1167) ~[hadoop-core-0.20.2-cdh3u1.jar:na]
    at java.lang.Thread.run(Thread.java:722) [na:1.7.0]

I'm using CDH3 with Hue. Jobs appears on job list with above running child error.

Hadoop can be run in [3 modes](http://hadoop.apache.org/common/docs/r0.21.0/single_node_setup.html#Prepare+to+Start+the+Hadoop+Cluster) - which mode are you running? I was able to get 'Local (Standalone) Mode' running in Eclipse without any exceptions and not other modes. Remove the configuration files and Hadoop will be defaulted to 'Local (Standalone) Mode'. — Praveen Sripati, Sep 30 '11 at 16:17
I'm using hadoop with on Pseudo-Distributed mode. I don't want local standalone mode because in future this java app should run jobs on 'not local' hadoop cluster and I'm finding way to achieve that. — mmatloka, Sep 30 '11 at 20:40
Mich - The jar/class files for the map/reducer functions are not visible to the tasktracker and so the exception. I posted a query in the Hadoop groups and looks like no one has tried successfully before. You might want to raise it again the Hadoop groups. Let us know here, if you are able to run is successfully — Praveen Sripati, Oct 01 '11 at 01:33
Hmm If no one has tried this successfully before, how then usually hadoop clusters are integrated with other applicaitons? — mmatloka, Oct 01 '11 at 07:39
@Praveen Sripati can you send me a link of this topic on Hadoop groups? — mmatloka, Oct 01 '11 at 07:50
got a bit confused - the query was for MRv2 - http://goo.gl/sckmU - but the exception I am getting for MRv1 and MRv2 are the same. — Praveen Sripati, Oct 01 '11 at 08:38
My goal is to not only run it from eclipse but in future as normal function from webapplication .war. — mmatloka, Oct 01 '11 at 09:17
@mich Did you ever get the pseudo-distributed Hadoop working? I'm starting to learn Hadoop and thought I'd install it in pseudo-distributed mode on an old desktop where I've installed Linux. — user949300, Dec 23 '11 at 17:34
how did you generate/package the jar for the code.As the error says, the map class is not visible. And did you create a jar in first place? — abhinav, Mar 21 '13 at 06:58

score 1 · Answer 1 · answered Oct 29 '12 at 17:13

1

You have to bundle your custom mapper/reducer implementations in an jar.

job.setJarByClass(TestHadoop.class);

will then lookup that jar and transfer it to the cluster.

answered Oct 29 '12 at 17:13

oae

1,513
1
17
23

score 0 · Answer 2 · answered Oct 29 '12 at 16:42

0

I know I'm probably way too late, but try declaring Map and Reduce as public, too.

answered Oct 29 '12 at 16:42

Chris Gerken

16,221
6
44
59

score 0 · Answer 3 · edited Sep 19 '13 at 02:38

0

mapred.job.tracker url should be http not hdfs...

and make Mapper and Reducer public..

//error 
conf.set("mapred.job.tracker", "hdfs://" + host + ":8021");

edited Sep 19 '13 at 02:38

Radim Köhler

122,561
47
239
335

answered Sep 19 '13 at 02:14

user2793692

1
1
1

Running java hadoop job on local/remote cluster

3 Answers3

Linked