1

I am working on code which uses openNLP. My code runs on eclipse perfectly, but when I run its jar on a cluster, I get the following error:

Exception in thread "main" java.lang.NoClassDefFoundError: opennlp/tools/util/ObjectStream
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: java.lang.ClassNotFoundException: opennlp.tools.util.ObjectStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
... 3 more
vefthym
  • 7,422
  • 6
  • 32
  • 58
vaibhavs
  • 21
  • 1

1 Answers1

2

You need to have the OpenNLP jar available and in your classpath on your tasks. There are several options:

  • -libjars and HADOOP_CLASSPATH, see Using the libjars option with Hadoop
  • 'fat jar': build a jar that contains all the necessary jars, submit the fat jar instead
  • install the 3rd party jars on all nodes (ie. make the cluster '3rd party aware')
  • use the HDFS distributed cache and download the necessary jars in your code

For a lengthier discussion see How-to: Include Third-Party Libraries in Your MapReduce Job

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • You can use maven shade to put the OpenNLP jar in the "fat jar" this post shows an example of shade http://stackoverflow.com/questions/22096909/create-runable-jar-with-maven-3-1-using-maven-dependency-plugin-dosnt-create-ru/22097164#22097164 – Mark Giaconia Apr 05 '14 at 13:03