1

Im trying to communicate with hbase using spark. I´m using this code below:

SparkConf sparkConf = new SparkConf().setAppName("HBaseRead");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
Configuration conf = HBaseConfiguration.create();
conf.addResource(new Path("/etc/hbase/conf/core-site.xml"));
conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf);

Scan scan = new Scan();
scan.setCaching(100);

JavaRDD<Tuple2<ImmutableBytesWritable, Result>> hbaseRdd = hbaseContext.hbaseRDD(TableName.valueOf("climate"), scan);

System.out.println("Number of Records found : " + hbaseRdd.count());

If I execute this, I get the following error:

Exception in thread "dag-scheduler-event-loop" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/regionserver/StoreFileWriter
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
    at java.lang.Class.getDeclaredMethod(Class.java:2128)
    at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
    at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
    at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
    at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472)
    at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
    ...

I did not find any solution via google. Has anyone an idea?

--------edit--------

I´m using maven. My Pom looks like:

<dependencies>   
    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-server</artifactId>
        <version>1.3.0</version>
    </dependency>        

    <dependency>
        <groupId>org.sharegov</groupId>
        <artifactId>mjson</artifactId>
        <version>1.4.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.5.2</version>
    </dependency> 

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.5.2</version>
    </dependency>

    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-csv_2.10</artifactId>
        <version>1.5.0</version>
    </dependency>

    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-xml_2.10</artifactId>
        <version>0.3.5</version>
    </dependency>        

    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-spark</artifactId>
        <version>2.0.0-SNAPSHOT</version>                               
    </dependency>

</dependencies>

Im building my application with dependencies using the maven-assembly-plugin

monti
  • 455
  • 7
  • 25

1 Answers1

0

You are getting the NoClassDefFoundError, because spark is not able to find hbase jars in the classpath, you need to supply the required jars to spark-submit explicitly using --jars parameter while launching job:

${SPARK_HOME}/bin/spark-submit \
--jars ${..add hbase jars comma separated...}
--class ....
.........
Rahul Sharma
  • 5,614
  • 10
  • 57
  • 91
  • Im building the spark application with maven (with dependencies), shouldn´t be in that every library which is needed? Which Hbase libraries would I need? I´m confused. Do I have to add the full hbase library (https://mvnrepository.com/artifact/org.apache.hbase/hbase/1.3.0)? Why? – monti Mar 20 '17 at 21:00
  • No, maven will not do it for you, until you use some additional maven plugin(maven-assembly-plugin)-http://stackoverflow.com/questions/8425453/maven-build-with-dependencies – Rahul Sharma Mar 20 '17 at 23:20
  • yes, I do build my jar with dependencies using maven. I gonna try to add Hbase. – monti Mar 21 '17 at 08:26
  • I added Hbase to pom, but maven isn´t able to download HBase depencies. I dont know why, but it tells me `Could not resolve dependencies for project thesis.test:SparkThesisTest:jar:4.3.0-SNAPSHOT`. – monti Mar 21 '17 at 10:41