0

I'm new to Spark (using v2.4.5), and am still trying to figure out the correct way to add external dependencies. When trying to add Kafka streaming to my project, my build.sbt looked liked this:

name := "Stream Handler"

version := "1.0"

scalaVersion := "2.11.12"

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "2.4.5" % "provided",  
    "org.apache.spark" % "spark-streaming_2.11" % "2.4.5" % "provided",
    "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.4.5"
)

This builds successfully, but when running with spark-submit, I get a java.lang.NoClassDefFoundError with KafkaUtils.

I was able to get my code working by passing in the dependency thru the --packages option like this:

$ spark-submit [other_args] --packages "org.apache.spark:org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.5"

Ideally I would like to set up all the dependencies in the build.sbt, but I'm not sure what I'm doing wrong. Any advice would be appreciated!

Awdrdt
  • 13
  • 1
  • the documentation says spark-core should be used defined as `spark-core_2.11` https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html#deploying – Bob Apr 05 '20 at 21:03

1 Answers1

0

your "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.4.5" is wrong.

change that to below like mvnrepo.. https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10

libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.4.5"
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121