Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/errors/TimeoutException

Question

Here's how my build.sbt file looks like:

name := "ProducerExample"

version := "0.1"

scalaVersion := "2.11.12"

run in Compile := { Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)) }

fork in run := true
javaOptions in run ++= Seq(
    "-Dlog4j.debug=true",
    "-Dlog4j.configuration=log4j.properties")

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "2.4.0",
    "org.apache.spark" %% "spark-streaming-kafka" % "1.6.2",
    "org.apache.kafka" %% "kafka" % "2.2.1"
)

I'm writing a scala program that uses the above mentioned libraries. When I run that program inside IntelliJ, it works. Now I do sbt package and then locate my created jar file. Now when I try to run the same program via spark-submit using this command:

spark-submit --class ProducerExample /Users/sparker0i/ProducerExample/target/scala-2.11/producerexample_2.11-0.1.jar

I'm getting the following error:

19/06/07 13:07:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/errors/TimeoutException
    at ProducerExample$.main(ProducerExample.scala:16)
    at ProducerExample.main(ProducerExample.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.errors.TimeoutException
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 14 more
19/06/07 13:07:52 INFO ShutdownHookManager: Shutdown hook called
19/06/07 13:07:52 INFO ShutdownHookManager: Deleting directory /private/var/folders/7y/xbn9t08j1lbb5fjq03x13hzr0000gn/T/spark-5b3307e0-47f7-4d6a-b5e7-fef0a7f52881

Even when I try including the jar files like:

spark-submit --class ProducerExample /Users/sparker0i/ProducerExample/target/scala-2.11/producerexample_2.11-0.1.jar --jars Downloads/spark-streaming-kafka_2.11-1.6.3.jar --jars Downloads/kafka_2.11-2.2.1.jar

It still throws the same error.

EDIT: Is there any other way other than doing sbt assembly to do this? I have a restriction on the jar file size I can upload to my hdfs, so sbt assembly isn't an option for me.

1) You're mixing Spark versions 2) You shouldn't add kafka dependencies outside of what Spark Streaming already has, however you only need `kafka-clients` 3) spark-core should be marked as provided in the sbt 4) you will need an uber jar with all the dependencies in it — OneCricketeer, Jun 08 '19 at 01:23
1) hi, can you please clarify on where I'm mixing Spark versions? 2) So you mean I should only add `kafka-clients` and not `spark-streaming-kafka` 3) Do you mean while running in the console or while running inside IntelliJ as well? 4) I did not understand what you meant by an uber jar. Could you please explain? — Sparker0i, Jun 08 '19 at 07:47
You're putting both Spark 2 and Spark 1.6. You'd have to use `spark-streaming-kafka % 2.4.0`, see the documentation along with the note underneath https://spark.apache.org/docs/2.4.0/streaming-kafka-0-10-integration.html#linking... Also make sure you are running Spark 2.4.0 when you are deploying your code against a real Spark cluster rather than just running locally. sbt assembly puts all libraries required by your app in a single jar, this is an uber jar. — OneCricketeer, Jun 08 '19 at 13:35
That final jar wouldn't be so large if you marked Spark Core at provided. `"org.apache.spark" %% "spark-core" % "2.4.0" % "provided"` — OneCricketeer, Jun 08 '19 at 13:38
So when I give `provided` , how do I provide the packages when I run `spark-submit`? — Sparker0i, Jun 08 '19 at 17:26
The same way. Provided just means those specific dependencies aren't in the final image. Spark core and streaming are already on the classpath — OneCricketeer, Jun 09 '19 at 02:55

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/errors/TimeoutException

0 Answers0