5

Similar to my question here but this time it's Java, not Python, causing me problems.

I have followed the steps advised (to the best of my knowledge) here but since I'm using hadoop-2.6.1 I think I should be using the old API, rather than the new API referred to in the example.

I'm working on Ubuntu and the various component versions I have are

  • Spark spark-1.5.1-bin-hadoop2.6
  • Hadoop hadoop-2.6.1
  • Mongo 3.0.8
  • Mongo-Hadoop connector jars included via Maven
  • Java 1.8.0_66
  • Maven 3.0.5

My Java program is basic

import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import com.mongodb.hadoop.MongoInputFormat;
import org.apache.hadoop.conf.Configuration;
import org.bson.BSONObject;

public class SimpleApp {
  public static void main(String[] args) {
    Configuration mongodbConfig = new Configuration();
    mongodbConfig.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
    mongodbConfig.set("mongo.input.uri", "mongodb://localhost:27017/db.collection");
    SparkConf conf = new SparkConf().setAppName("Simple Application");
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaPairRDD<Object, BSONObject> documents = sc.newAPIHadoopRDD(
        mongodbConfig,            // Configuration
        MongoInputFormat.class,   // InputFormat: read from a live cluster.
        Object.class,             // Key class
        BSONObject.class          // Value class
    );
  }
}

It is building fine using Maven (mvn package) with the following pom file

<project>
<groupId>edu.berkeley</groupId>
  <artifactId>simple-project</artifactId>
  <modelVersion>4.0.0</modelVersion>
  <name>Simple Project</name>
  <packaging>jar</packaging>
  <version>1.0</version>
  <dependencies>
    <dependency> <!-- Spark dependency -->
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.10</artifactId>
      <version>1.5.1</version>
    </dependency>
    <dependency>
        <groupId>org.mongodb</groupId>
        <artifactId>mongo-java-driver</artifactId>
        <version>3.2.0</version>
    </dependency>
    <dependency>
      <groupId>org.mongodb.mongo-hadoop</groupId>
      <artifactId>mongo-hadoop-core</artifactId>
      <version>1.4.2</version>
    </dependency>
  </dependencies>
  <build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
            </configuration>
        </plugin>
    </plugins>
</build>
</project>

I then submit the jar

/usr/local/share/spark-1.5.1-bin-hadoop2.6/bin/spark-submit --class "SimpleApp" --master local[4] target/simple-project-1.0.jar

and get the following error

Exception in thread "main" java.lang.NoClassDefFoundError: com/mongodb/hadoop/MongoInputFormat
    at SimpleApp.main(SimpleApp.java:18)

NOTICE

I edited this question on the 18th December as it had grown far too confusing and verbose. Previous comments might look irrelevant. The context of the question, however, is the same.

Community
  • 1
  • 1
Philip O'Brien
  • 4,146
  • 10
  • 46
  • 96
  • Find the project that have "SampleSparkMongoProgram.java" class and buil this project using "clean install" Maven command. Then update the project (Alt+F5) and select all the projects. – Karpinski Dec 04 '15 at 17:30

1 Answers1

3

I faced same problems but after lot of trials& changes, I got my work done with this code. I'm running Maven project with netbeans on ubuntu & Java 7 Hope this helps.

Include maven-shade-plugin if there are any conflicts b/w classes

P.S: I don't know about your particular error, but faced such a plenty. and this code is running perfectly .

   <dependencies>
              <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>1.5.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>1.5.1</version>
        </dependency>
        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.14</version>
        </dependency>
        <dependency>
            <groupId>org.mongodb.mongo-hadoop</groupId>
            <artifactId>mongo-hadoop-core</artifactId>
            <version>1.4.1</version>
        </dependency>
    </dependencies>

Java code

  Configuration conf = new Configuration();
    conf.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
    conf.set("mongo.input.uri", "mongodb://localhost:27017/databasename.collectionname");
    SparkConf sconf = new SparkConf().setMaster("local").setAppName("Spark UM Jar");

    JavaRDD<User> UserMaster = sc.newAPIHadoopRDD(conf, MongoInputFormat.class, Object.class, BSONObject.class)
            .map(new Function<Tuple2<Object, BSONObject>, User>() {
                @Override
                public User call(Tuple2<Object, BSONObject> v1) throws Exception {
                    //return User
                }

            }
Anil
  • 618
  • 1
  • 7
  • 13