2

I've been trying to resolve this issue for a while, but I can't seem to find an answer. I am writing a simple Spark application in Scala which instantiates a NiFi receiver, and although it builds successfully with SBT, I receive the following error when I try to run the application using spark-submit:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/nifi/spark/NiFiReceiver
    at <app name>.main(main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.ClassNotFoundException: org.apache.nifi.spark.NiFiReceiver
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 10 more

I have tried a few variations, but this is my build.sbt file:

name := "<application name here>"
version := "1.0"
scalaVersion := "2.10.5"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.2" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.6.2" % "provided"
libraryDependencies += "org.apache.nifi" % "nifi-spark-receiver" % "0.7.0"
libraryDependencies += "org.apache.nifi" % "nifi-site-to-site-client" % "0.7.0"

It should be noted that if I change the two nifi lines to use the Scala equivalents (i.e. the first percent sign in each line is replaced with two percent signs), I actually receive the following error when I run "sbt package":

[error] (*:update) sbt.ResolveException: unresolved dependency: org.apache.nifi#nifi-spark-receiver_2.10;0.7.0: not found
[error] unresolved dependency: org.apache.nifi#nifi-site-to-site-client_2.10;0.7.0: not found

As I mentioned before, with the single percentage signs (and therefore using the Java dependencies) I get no error on build, but I do at runtime.

The relevant part of my application (with certain names removed) is as follows:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import java.time
import java.time._
import org.apache.nifi._
import java.nio.charset._
import org.apache.nifi.spark._
import org.apache.nifi.remote.client._
import org.apache.spark._
import org.apache.nifi.events._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.nifi.remote._
import org.apache.nifi.remote.protocol._
import org.apache.spark.streaming.receiver._
import org.apache.spark.storage._
import java.io._
import org.apache.spark.serializer._

object <app name> {
    def main(args: Array[String]) {

        val nifiUrl = "<nifi url>"
        val nifiReceiverConfig = new SiteToSiteClient.Builder()
            .url(nifiUrl)
            .portName("Data for Spark")
            .buildConfig()

        val conf = new SparkConf().setAppName("<app name>")
        val ssc = new StreamingContext(conf, Seconds(10))
        val packetStream = ssc.receiverStream(new NiFiReceiver(nifiReceiverConfig, StorageLevel.MEMORY_ONLY))

The error is referring to the last line here, where the NifiReceiver is instantiated - it can't seem to find that class name anywhere.

I have do far tried a number of approaches including the following (separately): 1) Finding the jar files for nifi-spark-receiver and nifi-site-to-site-client and adding them into a lib directory in my project 2) Following this post https://community.hortonworks.com/articles/12708/nifi-feeding-data-to-spark-streaming.html. I ended up making a copy of spark-default.conf.template in my Spark conf directory, renaming it to spark-defaults.conf and adding the two lines in step 1 at that link into the file (substituting for the actual names and locations of the files in question). I then ensured that I had all the necessary import statements that were used in the two code examples on that page 3) Created a project directory at the root of my application directory, and then created a file called assembly.sbt inside it. I then added the following line inside (as referenced here: https://github.com/sbt/sbt-assembly):

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")

After that I then ran "sbt assembly" instead of "sbt package" to have the application create an uber jar, but this then failed as well with the same error as when running "sbt package" with the Scala dependencies in the build.sbt file:

[error] (*:update) sbt.ResolveException: unresolved dependency: org.apache.nifi#nifi-spark-receiver_2.10;0.7.0: not found
[error] unresolved dependency: org.apache.nifi#nifi-site-to-site-client_2.10;0.7.0: not found

Please let me know if any further information is required. Thanks in advance for any help.

James
  • 11,721
  • 2
  • 35
  • 41
MattC
  • 671
  • 1
  • 6
  • 15

1 Answers1

1

Okay, so I've managed to resolve this, and here is the answer for anyone who may be interested:

The answer was to go back down the uber-jar route and use "sbt assembly" instead of "sbt package" to include the necessary dependency jars in my uber-jar.

1) Create a directory called "project" under the root and place a file called assembly.sbt in there containing the following (the addition here from my original attempt is the resolvers line):

resolvers += Resolver.url("sbt-plugin-releases-scalasbt", url("http://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/"))

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")

2) In the build.sbt file at the root of the project, use following dependency references (i.e. not spark version specific):

libraryDependencies += "org.apache.nifi" % "nifi-spark-receiver" % "0.7.0"
libraryDependencies += "org.apache.nifi" % "nifi-site-to-site-client" % "0.7.0"

I also marked spark-core and spark-streaming as "provided", i.e.

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.2" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.6.2" % "provided"

This means you'll need to provide spark separately, but it stops making the uber-jar even larger.

3) In the same file, add the following code to deal with merges when pulling the dependencies in (this is important):

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
   {
    case PathList("META-INF", xs @ _*) => MergeStrategy.discard
    case x => MergeStrategy.first
   }
}

4) Ensure that the relevant import statements are present in your scala file, e.g.

import org.apache.nifi._
import org.apache.nifi.spark._

etc.

Then when you run "sbt assembly" it should build successfully - just reference this jar when calling "spark-submit", i.e.

spark-submit --class "<application class name>" --master "<local or url>" "<path to uber-jar from project root directory>"

It should be noted that the following post was a massive help in finding this solution: java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ while running TwitterPopularTags

Community
  • 1
  • 1
MattC
  • 671
  • 1
  • 6
  • 15