8

I am writing an application in Scala that uses Spark. I am packaging the app using Maven and running into problems when constructing an "uber" or "fat" jar.

The problem I am facing is that running the application works fine inside of an IDE or if I provide a non-uber-jar'd version of the dependencies as the java class path, but it does not work if I give the uber jar as the class path, i.e.

java -Xmx2G -cp target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar debug.spark_example.Example data.txt 

does not work. I get the following error message:

ERROR SparkContext: Error initializing SparkContext.
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
    at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
    at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
    at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:168)
    at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:504)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
    at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
    at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
    at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
    at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
    at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:245)
    at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:52)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:247)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
    at debug.spark_example.Example$.main(Example.scala:9)
    at debug.spark_example.Example.main(Example.scala)

I would really appreciate help understanding what I need to add to the pom.xml file and why I need to add it to get this to work.

I have searched online and found the following resources, which I tried (see in the pom), but could not get to work:

1) Spark User Mailing list: http://apache-spark-user-list.1001560.n3.nabble.com/Packaging-a-spark-job-using-maven-td5615.html

2) how to package spark scala application

I have a simple example that demonstrates this problem, a simple 1 class project (src/main/scala/debug/spark_example/Example.scala):

package debug.spark_example

import org.apache.spark.{SparkConf, SparkContext}

object Example {
  def main(args: Array[String]): Unit = {
    val sc = new SparkContext(new SparkConf().setAppName("Test").setMaster("local[2]"))
    val lines = sc.textFile(args(0))
    val lineLengths = lines.map(s => s.length)
    val totalLength = lineLengths.reduce((a, b) => a + b)
    lineLengths.foreach(println)
     println(totalLength)
   }
 }

Here is the pom.xml file:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>debug.spark-example</groupId>
  <artifactId>spark-example</artifactId>
  <version>0.1-SNAPSHOT</version>
  <inceptionYear>2015</inceptionYear>
  <properties>
    <scala.majorVersion>2.11</scala.majorVersion>
    <scala.minorVersion>.2</scala.minorVersion>
    <spark.version>1.4.1</spark.version>
  </properties>


  <repositories>
    <repository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
  </repositories>

  <pluginRepositories>
    <pluginRepository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </pluginRepository>
  </pluginRepositories>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.majorVersion}${scala.minorVersion}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_${scala.majorVersion}</artifactId>
      <version>${spark.version}</version>
    </dependency>
  </dependencies>


  <build>
      <sourceDirectory>src/main/scala</sourceDirectory>
      <plugins>
        <plugin>
          <groupId>org.scala-tools</groupId>
          <artifactId>maven-scala-plugin</artifactId>
          <executions>
            <execution>
              <goals>
                <goal>compile</goal>
                <goal>testCompile</goal>
              </goals>
            </execution>
          </executions>
        </plugin>
        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-eclipse-plugin</artifactId>
      <configuration>
        <downloadSources>true</downloadSources>
        <buildcommands>
          <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
        </buildcommands>
        <additionalProjectnatures>
          <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
        </additionalProjectnatures>
        <classpathContainers>
          <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
          <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
        </classpathContainers>
      </configuration>
    </plugin>
    <plugin>
      <artifactId>maven-assembly-plugin</artifactId>
      <version>2.4</version>
      <executions>
        <execution>
          <id>make-assembly</id>
          <phase>package</phase>
          <goals>
            <goal>attached</goal>
          </goals>
        </execution>
      </executions>
      <configuration>
        <tarLongFileMode>gnu</tarLongFileMode>
        <descriptorRefs>
          <descriptorRef>jar-with-dependencies</descriptorRef>
        </descriptorRefs>
      </configuration>
    </plugin>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <version>2.2</version>
      <executions>
        <execution>
          <phase>package</phase>
          <goals>
            <goal>shade</goal>
          </goals>
          <configuration>
            <minimizeJar>false</minimizeJar>
            <createDependencyReducedPom>false</createDependencyReducedPom>
            <artifactSet>
              <includes>
                <!-- Include here the dependencies you want to be packed in your fat jar -->
                <include>*:*</include>
              </includes>
            </artifactSet>
            <filters>
              <filter>
                <artifact>*:*</artifact>
                <excludes>
                  <exclude>META-INF/*.SF</exclude>
                  <exclude>META-INF/*.DSA</exclude>
                  <exclude>META-INF/*.RSA</exclude>
                </excludes>
              </filter>
            </filters>
            <transformers>
              <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                <resource>reference.conf</resource>
              </transformer>
            </transformers>
          </configuration>
        </execution>
      </executions>
    </plugin>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-surefire-plugin</artifactId>
      <version>2.7</version>
      <configuration>
        <skipTests>true</skipTests>
      </configuration>
    </plugin>
  </plugins>
</build>
<reporting>
  <plugins>
    <plugin>
      <groupId>org.scala-tools</groupId>
      <artifactId>maven-scala-plugin</artifactId>
    </plugin>
  </plugins>
</reporting>
</project>

Many thanks in advance for your help.

Community
  • 1
  • 1
br19
  • 811
  • 1
  • 8
  • 11
  • Can you elaborate on does not work? – Holden Sep 05 '15 at 20:53
  • @Holden I added the error message I am getting to the question. Thanks for looking at this! – br19 Sep 05 '15 at 23:40
  • Did you look into the Akka instructions for shade: http://doc.akka.io/docs/akka/snapshot/general/configuration.html. – Edmon Sep 06 '15 at 01:07
  • @Edmon Yes. I'm not experienced using Maven nor Akka, but I tried the shade plugin example given in those instructions and the version given in the Spark user guide (linked in the question). I also tried adding a src/main/resources/reference.conf file like [this](https://github.com/akka/akka/blob/master/akka-actor/src/main/resources/reference.conf). All of these resulted in the error message above. – br19 Sep 06 '15 at 02:09
  • 1
    Also, it seems that multiple Akka config values (possibly all of them) are not being found by the SparkConf object. If I set the akka.version manually, i.e. `new SparkConf().setAppName("Test").setMaster("local[2]").set("akka.version","2.1")`, then it says that `akka.actor.guardian-supervisor-strategy` is not set. – br19 Sep 06 '15 at 02:13
  • You're using jar with dependencies, it looks exactly like the problem described here http://stackoverflow.com/questions/31011243/no-configuration-setting-found-for-key-akka-version/31011315#31011315 – Zoltán Oct 14 '15 at 15:12
  • @Zoltán Thank you. I tried using the shade plugin w/o the assembly plug in as suggested by the answer by Jeff S. below to no avail. Perhaps, I need to specify more Akka configurations? Though, from my limited knowledge of Spark, this is not necessary? I was able to get it to work using the Spark-Submit script (see answer below). – br19 Oct 14 '15 at 15:36

3 Answers3

3

It seems that the Spark submit script must be used to run the program.

Rather than:

java -Xmx2G -cp target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar debug.spark_example.Example data.txt 

Do something like:

<path-to>/spark-1.4.1/bin/spark-submit --class debug.spark_example.Example --master local[2] target/spark-example-0.1-SNAPSHOT-jar-with-dependencies.jar data.txt

It also seems to work without the shaded jar; with only the jar-with-dependencies. The following pom.xml file worked for me:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>debug.spark-example</groupId>
    <artifactId>spark-example</artifactId>
    <version>0.1-SNAPSHOT</version>
    <inceptionYear>2015</inceptionYear>
    <properties>
        <scala.majorVersion>2.11</scala.majorVersion>
        <scala.minorVersion>.2</scala.minorVersion>
        <spark.version>1.4.1</spark.version>
    </properties>
    <repositories>
        <repository>
            <id>scala-tools.org</id>
            <name>Scala-Tools Maven2 Repository</name>
            <url>http://scala-tools.org/repo-releases</url>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id>scala-tools.org</id>
            <name>Scala-Tools Maven2 Repository</name>
            <url>http://scala-tools.org/repo-releases</url>
        </pluginRepository>
    </pluginRepositories>
    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.majorVersion}${scala.minorVersion}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.majorVersion}</artifactId>
            <version>${spark.version}</version>
        </dependency>
    </dependencies>
    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <plugins>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-eclipse-plugin</artifactId>
                <configuration>
                    <downloadSources>true</downloadSources>
                    <buildcommands>
                        <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
                    </buildcommands>
                    <additionalProjectnatures>
                        <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
                    </additionalProjectnatures>
                <classpathContainers>
                    <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
                    <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
                </classpathContainers>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4</version>
                <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>attached</goal>
                    </goals>
                </execution>
            </executions>
            <configuration>
                <tarLongFileMode>gnu</tarLongFileMode>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-surefire-plugin</artifactId>
            <version>2.7</version>
            <configuration>
                <skipTests>true</skipTests>
            </configuration>
        </plugin>
    </plugins>
</build>
<reporting>
    <plugins>
        <plugin>
            <groupId>org.scala-tools</groupId>
            <artifactId>maven-scala-plugin</artifactId>
        </plugin>
    </plugins>
</reporting>
</project>
br19
  • 811
  • 1
  • 8
  • 11
0

This may have something to do with the order of your maven plugins. You're using both the "maven-assembly-plugin" and "maven-shade-plugin" plugins in your project, both bound to the same phase in the maven lifecycle. When this happens, maven executes the plugins in the order that they appear in the plugins section, so in your case it executes the assembly plugin, then the shade plugin.

Based on the output jar you're trying to run and the shade transformation you have, you probably want the opposite order. However, you may not even need the assembly plugin for your use case. You might be able to use the target/spark-example-0.1-SNAPSHOT-shaded.jar file.

<plugins>
  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <!-- SNIP -->
  </plugin>
  <plugin>
    <artifactId>maven-assembly-plugin</artifactId>
    <!-- SNIP -->
  </plugin>
</plugins>
Jeff S
  • 63
  • 1
  • 4
  • Thank you for your answer Jeff. I'm still having trouble getting this to work. I tried reversing the order of the plugins and using the shaded jar. Reversing the order didn't change the uber jar. When using the shaded jar, the error message is: `akka.ConfigurationException: Type [akka.dispatch.BoundedControlAwareMessageQueueSemantics] specified as akka.actor.mailbox.requirement [akka.actor.mailbox.bounded-control-aware-queue-based] in config can't be loaded due to [akka.dispatch.BoundedControlAwareMessageQueueSemantics] ` – br19 Oct 09 '15 at 00:58
0

Akka Docs helped me fix the issue. If you are using Shade then you must specify a transformer

                            <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                <resource>reference.conf</resource>
                            </transformer>
                            <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <manifestEntries>
                                    <Main-Class>akka.Main</Main-Class>
                                </manifestEntries>
                            </transformer>
Ram
  • 1,297
  • 1
  • 11
  • 17