0

I am attempting to print messages consumed from Kafka via Spark streaming. However, I keep running into the following error:

16/09/04 16:03:33 ERROR ApplicationMaster: User class threw exception: java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$

There have been a few questions asked on StackOverflow regarding this very issue. Ex: https://stackoverflow.com/questions/27710887/kafkautils-class-not-found-in-spark-streaming#=

The answers given have not resolved this issue for me. I have tried creating an "uber jar" using sbt assembly and that did not work either.

Contents of sbt file:

name := "StreamKafka"

version := "1.0"

scalaVersion := "2.10.5"


libraryDependencies ++= Seq(
    "org.apache.kafka" % "kafka_2.10" % "0.8.2.1" % "provided",
    "org.apache.spark" % "spark-streaming_2.10" % "1.6.1" % "provided",
    "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.1" % "provided",
    "org.apache.spark" % "spark-core_2.10" % "1.6.1" % "provided" exclude("com.esotericsoftware.minlog", "minlog") exclude("com.esotericsoftware.kryo", "kryo")
)

resolvers ++= Seq(
    "Maven Central" at "https://repo1.maven.org/maven2/"
)


assemblyMergeStrategy in assembly := {
  case m if m.toLowerCase.endsWith("manifest.mf")          =>     MergeStrategy.discard
  case m if m.toLowerCase.matches("meta-inf.*\\.sf$")      =>     MergeStrategy.discard
  case "log4j.properties"                                  =>     MergeStrategy.discard
  case m if m.toLowerCase.startsWith("meta-inf/services/") =>     MergeStrategy.filterDistinctLines
  case "reference.conf"                                    =>     MergeStrategy.concat
  case _                                                   =>     MergeStrategy.first
  case PathList(ps @ _*) if ps.last endsWith "pom.properties" =>  MergeStrategy.discard
  case x => val oldStrategy = (assemblyMergeStrategy in assembly).value
  oldStrategy(x)
}
Community
  • 1
  • 1

3 Answers3

1

Posting the answer from the comments so that it will be easy for others to to solve the issue.

You have to remove "provided" from kafka dependencies

"org.apache.kafka" % "kafka_2.10" % "0.8.2.1" % "provided",
"org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.1" % "provided"

For the dependencies to be bundled in the jar you have to run the command sbt assembly

Also make sure that you are running the right jar file. You can find the right jar file name by checking the sbt assembly command's log.

vishnu viswanath
  • 3,794
  • 2
  • 36
  • 47
0

This may be silly to ask, but does the streamkafka_‌​2.10-1.0.jar contains the org/apache/spark/streaming/kafka/KafkaUtils.class

Anupam Jain
  • 476
  • 2
  • 6
0

As long as cluster provides Kafka / Spark classes at runtime, dependencies must be excluded from the assembled JAR. If not, you should expect errors such as this from Java class-loader during application startup.

Additional benefit of assembly without dependencies is faster deployment. If the cluster provides dependancies in runtime it's the best option to omit those decencies using % "provided"