Exception caught when trying to run a mapreduce job from java application

Question

I need to invoke a mapreduce job from java application. I use

ToolRunner.run(new Validation(), pathsMoveToFinal.toArray(new String[pathsMoveToFinal.size()]));

If I don't set conf's mapred.job.jobtracker, it runs like forever. The map task turns to 100% then go down back to other percentage. If I set mapred.job.jobtracker, it complains mapper class cannot be found:

java.lang.RuntimeException: java.lang.ClassNotFoundException:  utils.DataValidationExtractorMapper
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: utils.DataValidationExtractorMapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
... 4 more

Could anyone please give me some hint. Thank you and have a good weekend.

this is a classpath problem. If you are using eclipse, you have to add the jars to the classpath — Adel Boutros, Jan 21 '12 at 01:12
Thank you. I have the jars in my class path. When I try from windows eclipse, it can go to Validation's run method. But after I package it and run from linux, it give this exception. I have the jar in the lib folder. — lucky_start_izumi, Jan 21 '12 at 01:17
Unless you're building an all-in-one jar, you need to fix your class path. — Dave Newton, Jan 21 '12 at 01:24
Thank you for reminding. I use maven to build, and I can see the jar is there. — lucky_start_izumi, Jan 21 '12 at 01:33
commons-cli is not in your classpath at runtime when you run your jar. you might try packaging its contents in your jar — Steve Severance, Jan 21 '12 at 04:21

score 1 · Answer 1 · edited May 23 '17 at 12:32

1

Since you're using Maven, I highly recommend baking your dependencies statically into your JAR.

The reason this occurs is your Mapper and Reducer JREs have no pre-existing context of your client's class path. Baking in the dependencies is future-proof and stable, and Hadoop should work with this JAR quite happily.

edited May 23 '17 at 12:32

Community

1
1

answered Mar 26 '12 at 22:53

MrGomez

23,788
45
72

score 1 · Answer 2 · edited May 23 '17 at 11:58

Please see my previous answer (and other answers) here:

How to make a monolithic jar.file?

then run with hadoop jar.

Setting the classpath on shared/unowned boxes may be a big issue since the jar files have to be replicated to all the task servers. Add one server, forget to set the classpath, ouch, my job breaks on some task machine but runs in others. Try debugging that when you have 100 boxes! Monolithic jars will let you encapsulate all of your dependencies into one big distributable jar.

score 1 · Accepted Answer · answered Mar 30 '12 at 17:18

1

Sovled. It's not because of maven thing. When i try to start mapreduce job from java code, I have to pack the mapreduce job in a jar. Because hadoop was trying to copy the jar to different task jvms. Thanks for all the suggestion!

answered Mar 30 '12 at 17:18

lucky_start_izumi

2,511
13
41
61

Exception caught when trying to run a mapreduce job from java application

3 Answers3