0

I need to invoke a mapreduce job from java application. I use

ToolRunner.run(new Validation(), pathsMoveToFinal.toArray(new String[pathsMoveToFinal.size()]));

If I don't set conf's mapred.job.jobtracker, it runs like forever. The map task turns to 100% then go down back to other percentage. If I set mapred.job.jobtracker, it complains mapper class cannot be found:

java.lang.RuntimeException: java.lang.ClassNotFoundException:  utils.DataValidationExtractorMapper
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: utils.DataValidationExtractorMapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
... 4 more

Could anyone please give me some hint. Thank you and have a good weekend.

lucky_start_izumi
  • 2,511
  • 13
  • 41
  • 61

3 Answers3

1

Since you're using Maven, I highly recommend baking your dependencies statically into your JAR.

The reason this occurs is your Mapper and Reducer JREs have no pre-existing context of your client's class path. Baking in the dependencies is future-proof and stable, and Hadoop should work with this JAR quite happily.

Community
  • 1
  • 1
MrGomez
  • 23,788
  • 45
  • 72
1

Please see my previous answer (and other answers) here:

How to make a monolithic jar.file?

then run with hadoop jar.

Setting the classpath on shared/unowned boxes may be a big issue since the jar files have to be replicated to all the task servers. Add one server, forget to set the classpath, ouch, my job breaks on some task machine but runs in others. Try debugging that when you have 100 boxes! Monolithic jars will let you encapsulate all of your dependencies into one big distributable jar.

Community
  • 1
  • 1
Jaime Garza
  • 483
  • 1
  • 4
  • 10
1

Sovled. It's not because of maven thing. When i try to start mapreduce job from java code, I have to pack the mapreduce job in a jar. Because hadoop was trying to copy the jar to different task jvms. Thanks for all the suggestion!

lucky_start_izumi
  • 2,511
  • 13
  • 41
  • 61