1

I want to launch the MongoDB Hadoop Streaming connector, so I downloaded a compatible version of Hadoop (the 2.2.0) (see https://github.com/mongodb/mongo-hadoop/blob/master/README.md#apache-hadoop-22)

I cloned the git repository mongohadoop, changed the build.sbt hadoopRelease for 2.2 :

$ cat build.sbt
name := "mongo-hadoop"

organization := "org.mongodb"

hadoopRelease in ThisBuild := "2.2"

Then I launched:

$ ./sbt package
$ ./sbt mongo-hadoop-streaming/assembly
$ cp core/target/mongo-hadoop-core_2.2.0-1.2.0.jar ../hadoop-2.2.0/lib/
$ cp mongo-2.7.3.jar ../hadoop-2.2.0/lib/ # Previously downloaded
$ cd ../hadoop-2.2.0/
$ ./bin/hadoop jar ../mongo-hadoop/streaming/target/mongo-hadoop-streaming-assembly-1.1.0.jar -mapper ...

And I get this :

Exception in thread "main" java.lang.ClassNotFoundException: com.mongodb.hadoop.streaming.MongoStreamJob
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

I don't understand why, I tried almost every version supposed to support streaming but I always get the same error !

I precise I am on Mac OS X. Thanks !

Julien Fouilhé
  • 2,583
  • 3
  • 30
  • 56

1 Answers1

2

That is actually a bug that will be fixed in an upcoming release. Need for that main class was removed but the generated manifest was not. You can tweak your jar by removing the Main-Class entry from the manifest in the streaming jar. If you run the script below in the directory where your streaming jar is, it'll fix that for you:

#! /bin/sh

M=META-INF/MANIFEST.MF
mkdir tmp
cd tmp
cp ../$1 .
JAR=$1

jar xf ${JAR}

sed -e '/Main-Class/d' ${M} >> ${M}.new 
mv ${M}.new  ${M}

jar cvfm ${JAR} ${M}

mv ${JAR} ..
cd ..
rm -r tmp

It's not super pretty but should get you over the hump. We'll try to get a formal 1.2.1 release out soonish. Here's the jira ticket in the meantime: https://jira.mongodb.org/browse/HADOOP-121

evanchooly
  • 6,102
  • 1
  • 16
  • 23
  • Thanks, it resolved my error, but now, instead of com.mongodb.hadoop.streaming.MongoStreamJob, I get `java.lang.ClassNotFoundException: -mapper` when I do : `./bin/hadoop jar ../../mongo-hadoop/streaming/target/mongo-hadoop-streaming-assembly-1.1.0.jar -mapper ../../mongo-hadoop/streaming/examples/enron/enron_map.js -reducer ../../mongo-hadoop/streaming/examples/enron/enron_reduce.js` I don't really know how Java works (that's why I would like to use streaming actually :)) – Julien Fouilhé Mar 06 '14 at 17:24
  • I think you'll want -libjars there instead of 'jar'. See https://github.com/mongodb/mongo-hadoop/blob/master/streaming/README.md#overview for an example – evanchooly Mar 06 '14 at 20:14
  • Thanks, but in that case you should update `streaming/examples/enron/*.sh`, because it is not the same command at all... I couldn't figure out how to make 2.2 works, so I changed to 0.23 and now it works. – Julien Fouilhé Mar 07 '14 at 09:25
  • I'll definitely double check that documentation. Thanks for the heads up. – evanchooly Mar 07 '14 at 20:51