0

Here's how my build.sbt file looks like:

name := "ProducerExample"

version := "0.1"

scalaVersion := "2.11.12"

run in Compile := { Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)) }

fork in run := true
javaOptions in run ++= Seq(
    "-Dlog4j.debug=true",
    "-Dlog4j.configuration=log4j.properties")

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "2.4.0",
    "org.apache.spark" %% "spark-streaming-kafka" % "1.6.2",
    "org.apache.kafka" %% "kafka" % "2.2.1"
)

I'm writing a scala program that uses the above mentioned libraries. When I run that program inside IntelliJ, it works. Now I do sbt package and then locate my created jar file. Now when I try to run the same program via spark-submit using this command:

spark-submit --class ProducerExample /Users/sparker0i/ProducerExample/target/scala-2.11/producerexample_2.11-0.1.jar

I'm getting the following error:

19/06/07 13:07:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/errors/TimeoutException
    at ProducerExample$.main(ProducerExample.scala:16)
    at ProducerExample.main(ProducerExample.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.errors.TimeoutException
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 14 more
19/06/07 13:07:52 INFO ShutdownHookManager: Shutdown hook called
19/06/07 13:07:52 INFO ShutdownHookManager: Deleting directory /private/var/folders/7y/xbn9t08j1lbb5fjq03x13hzr0000gn/T/spark-5b3307e0-47f7-4d6a-b5e7-fef0a7f52881

Even when I try including the jar files like:

spark-submit --class ProducerExample /Users/sparker0i/ProducerExample/target/scala-2.11/producerexample_2.11-0.1.jar --jars Downloads/spark-streaming-kafka_2.11-1.6.3.jar --jars Downloads/kafka_2.11-2.2.1.jar

It still throws the same error.

EDIT: Is there any other way other than doing sbt assembly to do this? I have a restriction on the jar file size I can upload to my hdfs, so sbt assembly isn't an option for me.

Sparker0i
  • 1,787
  • 4
  • 35
  • 60
  • 1) You're mixing Spark versions 2) You shouldn't add kafka dependencies outside of what Spark Streaming already has, however you only need `kafka-clients` 3) spark-core should be marked as provided in the sbt 4) you will need an uber jar with all the dependencies in it – OneCricketeer Jun 08 '19 at 01:23
  • 1) hi, can you please clarify on where I'm mixing Spark versions? 2) So you mean I should only add `kafka-clients` and not `spark-streaming-kafka` 3) Do you mean while running in the console or while running inside IntelliJ as well? 4) I did not understand what you meant by an uber jar. Could you please explain? – Sparker0i Jun 08 '19 at 07:47
  • You're putting both Spark 2 and Spark 1.6. You'd have to use `spark-streaming-kafka % 2.4.0`, see the documentation along with the note underneath https://spark.apache.org/docs/2.4.0/streaming-kafka-0-10-integration.html#linking... Also make sure you are running Spark 2.4.0 when you are deploying your code against a real Spark cluster rather than just running locally. sbt assembly puts all libraries required by your app in a single jar, this is an uber jar. – OneCricketeer Jun 08 '19 at 13:35
  • That final jar wouldn't be so large if you marked Spark Core at provided. `"org.apache.spark" %% "spark-core" % "2.4.0" % "provided"` – OneCricketeer Jun 08 '19 at 13:38
  • So when I give `provided` , how do I provide the packages when I run `spark-submit`? – Sparker0i Jun 08 '19 at 17:26
  • 1
    The same way. Provided just means those specific dependencies aren't in the final image. Spark core and streaming are already on the classpath – OneCricketeer Jun 09 '19 at 02:55

0 Answers0