1

I'm trying to run hadoop job on local/remote cluster. This job in future will be executed from web application. I'm trying to execute this piece of code from eclipse:

public class TestHadoop {

    private final static String host = "localhost";

    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        run();
    }

    static void run() throws IOException, InterruptedException, ClassNotFoundException {
        Configuration conf = new Configuration();

        // run on other machine/cluster
        conf.set("fs.default.name", "hdfs://" + host + ":8020");
        conf.set("mapred.job.tracker", "hdfs://" + host + ":8021");

        Job job = new Job(conf, "Wordcount");
        job.setJarByClass(TestHadoop.class);

        FileInputFormat.addInputPath(job, new Path("/user/hue/jobsub/sample_data/midsummer.txt"));
        FileOutputFormat.setOutputPath(job, new Path("/tmp/hadoop-out2"));

        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        job.waitForCompletion(true);
    }

    static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { 
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
            }
        }
    }

    static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { 

        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
                InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            context.write(key, new IntWritable(sum));
        }
    }
}

However I get the following errors:

2011-09-30 16:32:39,000 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: java.lang.ClassNotFoundException: org.mmm.hadoop.TestHadoop$Map
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:602)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.ClassNotFoundException: org.mmm.hadoop.TestHadoop$Map
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:994)
... 8 more

16:33:01.209 [LeaseChecker] DEBUG org.apache.hadoop.hdfs.DFSClient - LeaseChecker is interrupted.
java.lang.InterruptedException: sleep interrupted
    at java.lang.Thread.sleep(Native Method) [na:1.7.0]
    at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1167) ~[hadoop-core-0.20.2-cdh3u1.jar:na]
    at java.lang.Thread.run(Thread.java:722) [na:1.7.0]

I'm using CDH3 with Hue. Jobs appears on job list with above running child error.

mmatloka
  • 1,986
  • 1
  • 20
  • 46
  • Hadoop can be run in [3 modes](http://hadoop.apache.org/common/docs/r0.21.0/single_node_setup.html#Prepare+to+Start+the+Hadoop+Cluster) - which mode are you running? I was able to get 'Local (Standalone) Mode' running in Eclipse without any exceptions and not other modes. Remove the configuration files and Hadoop will be defaulted to 'Local (Standalone) Mode'. – Praveen Sripati Sep 30 '11 at 16:17
  • I'm using hadoop with on Pseudo-Distributed mode. I don't want local standalone mode because in future this java app should run jobs on 'not local' hadoop cluster and I'm finding way to achieve that. – mmatloka Sep 30 '11 at 20:40
  • Mich - The jar/class files for the map/reducer functions are not visible to the tasktracker and so the exception. I posted a query in the Hadoop groups and looks like no one has tried successfully before. You might want to raise it again the Hadoop groups. Let us know here, if you are able to run is successfully – Praveen Sripati Oct 01 '11 at 01:33
  • Hmm If no one has tried this successfully before, how then usually hadoop clusters are integrated with other applicaitons? – mmatloka Oct 01 '11 at 07:39
  • @Praveen Sripati can you send me a link of this topic on Hadoop groups? – mmatloka Oct 01 '11 at 07:50
  • got a bit confused - the query was for MRv2 - http://goo.gl/sckmU - but the exception I am getting for MRv1 and MRv2 are the same. – Praveen Sripati Oct 01 '11 at 08:38
  • My goal is to not only run it from eclipse but in future as normal function from webapplication .war. – mmatloka Oct 01 '11 at 09:17
  • @mich Did you ever get the pseudo-distributed Hadoop working? I'm starting to learn Hadoop and thought I'd install it in pseudo-distributed mode on an old desktop where I've installed Linux. – user949300 Dec 23 '11 at 17:34
  • how did you generate/package the jar for the code.As the error says, the map class is not visible. And did you create a jar in first place? – abhinav Mar 21 '13 at 06:58

3 Answers3

1

You have to bundle your custom mapper/reducer implementations in an jar.

job.setJarByClass(TestHadoop.class); 

will then lookup that jar and transfer it to the cluster.

oae
  • 1,513
  • 1
  • 17
  • 23
0

I know I'm probably way too late, but try declaring Map and Reduce as public, too.

Chris Gerken
  • 16,221
  • 6
  • 44
  • 59
0

mapred.job.tracker url should be http not hdfs...

and make Mapper and Reducer public..

//error 
conf.set("mapred.job.tracker", "hdfs://" + host + ":8021");
Radim Köhler
  • 122,561
  • 47
  • 239
  • 335
user2793692
  • 1
  • 1
  • 1