0

I'm trying to connect Spark Streaming with Hbase. All I'm really doing with my code is using this example code but I'm getting a strange run time error of:

Exception in thread "streaming-job-executor-8" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at buri.sparkour.HBaseInteractor.<init>(HBaseInteractor.java:26)
at buri.sparkour.JavaCustomReceiver.lambda$main$94c29978$1(JavaCustomReceiver.java:104)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$2.apply(JavaDStreamLike.scala:280)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:256)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:256)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:255)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

There are few questions on Stack Overflow around this, all of which deal with adding paths to the correct jar files. I tried to build an "uber" jar using SBT and passing that on the spark-submit, yet I still get this error.

Here's my build.sbt file:

 

val sparkVersion = "2.1.0"

val hadoopVersion = "2.7.3"
val hbaseVersion  = "1.3.1"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
  "org.apache.spark" %% "spark-streaming" % sparkVersion ,
  "org.apache.commons" % "commons-csv" % "1.2" % "provided" ,
  "org.apache.hadoop" % "hadoop-hdfs" % "2.5.2" % "provided" ,
  "org.apache.hbase" % "hbase-spark" % "2.0.0-alpha-1" % "provided",
  "org.apache.hbase" % "hbase-client" % hbaseVersion ,
  "org.apache.hadoop" % "hadoop-common" % hadoopVersion % "provided" ,
  "org.apache.hbase" % "hbase-common" % hbaseVersion ,
  "org.apache.hbase" % "hbase-server" % hbaseVersion %  "provided",
  "org.apache.hbase" % "hbase" % hbaseVersion
)

assemblyMergeStrategy in assembly := {
 case PathList("META-INF", xs @ _*) => MergeStrategy.discard
 case x => MergeStrategy.first
}

Once the uber jar compiles, I can see that HBaseContext.class does indeed exist so I'm not sure why it can't find the class at runtime.

Any ideas?/Pointers?

(I've also tried defining the class paths in spark.driver.extraClassPath etc, but that doesn't work either)

Itchydon
  • 2,572
  • 6
  • 19
  • 33

1 Answers1

0

Take a look at this post regrading NoClassDefFoundError. I am not sure regarding build.sbt since I use Maven, but dependencies look fine.

gorros
  • 1,411
  • 1
  • 18
  • 29