Similar to my question here but this time it's Java, not Python, causing me problems.
I have followed the steps advised (to the best of my knowledge) here but since I'm using hadoop-2.6.1 I think I should be using the old API, rather than the new API referred to in the example.
I'm working on Ubuntu and the various component versions I have are
- Spark spark-1.5.1-bin-hadoop2.6
- Hadoop hadoop-2.6.1
- Mongo 3.0.8
- Mongo-Hadoop connector jars included via Maven
- Java 1.8.0_66
- Maven 3.0.5
My Java program is basic
import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import com.mongodb.hadoop.MongoInputFormat;
import org.apache.hadoop.conf.Configuration;
import org.bson.BSONObject;
public class SimpleApp {
public static void main(String[] args) {
Configuration mongodbConfig = new Configuration();
mongodbConfig.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
mongodbConfig.set("mongo.input.uri", "mongodb://localhost:27017/db.collection");
SparkConf conf = new SparkConf().setAppName("Simple Application");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaPairRDD<Object, BSONObject> documents = sc.newAPIHadoopRDD(
mongodbConfig, // Configuration
MongoInputFormat.class, // InputFormat: read from a live cluster.
Object.class, // Key class
BSONObject.class // Value class
);
}
}
It is building fine using Maven (mvn package
) with the following pom file
<project>
<groupId>edu.berkeley</groupId>
<artifactId>simple-project</artifactId>
<modelVersion>4.0.0</modelVersion>
<name>Simple Project</name>
<packaging>jar</packaging>
<version>1.0</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.1</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.mongodb.mongo-hadoop</groupId>
<artifactId>mongo-hadoop-core</artifactId>
<version>1.4.2</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
I then submit the jar
/usr/local/share/spark-1.5.1-bin-hadoop2.6/bin/spark-submit --class "SimpleApp" --master local[4] target/simple-project-1.0.jar
and get the following error
Exception in thread "main" java.lang.NoClassDefFoundError: com/mongodb/hadoop/MongoInputFormat
at SimpleApp.main(SimpleApp.java:18)
NOTICE
I edited this question on the 18th December as it had grown far too confusing and verbose. Previous comments might look irrelevant. The context of the question, however, is the same.