3

I'm building an Apache Spark application that can both be debugged locally and deployed to cluster. To do this, I have to define its dependency to spark-core (a Java/scala library) to fulfil the following requirement.

Included in in compile (other wise the compilation fails) Included in run and test (for local debugging and unit test) Excluded in assembly (for deployment to a cluster with provided spark-core, this decrease jar size by 70M, I'm using maven-shade plugin to generate the all-inclusive jar as there are some jar hell issues that cannot be resolved using maven-assembly)

Unfortunately it looks like custom scope wasn't natively supported by maven. Is there a way to enable it using some plugins?

tribbloid
  • 4,026
  • 14
  • 64
  • 103
  • http://stackoverflow.com/questions/18838944/sbt-how-can-i-add-provided-dependencies-back-to-run-test-tasks-classpath Namely I want to achieve the same using maven instead of sbt – tribbloid Jul 14 '14 at 22:48
  • What is the actual issue you're having, I'm using maven to run spark with no issues – aaronman Jul 15 '14 at 03:25

3 Answers3

1

We do exactly that on our maven build: exclude the Spark assembly from being included in the job assembly. We add an exclusion rule to the maven-shade plugin configuration.

<configuration>
    <shadedArtifactAttached>true</shadedArtifactAttached>
    <shadedClassifierName>jar-with-dependencies</shadedClassifierName>
        <artifactSet>
            <excludes>
                <exclude>org.apache.spark:spark-assembly</exclude>
            </excludes>
        </artifactSet>
...
</configuration>
maasg
  • 37,100
  • 11
  • 88
  • 115
  • Thanks a lot! let me try that first – tribbloid Jul 15 '14 at 13:01
  • OK I have tested it, looks not very effective, it increase jar size from 52M to 105M, looks like this exclusion is not transitive (all transitive dependencies of spark-assembly are retained in jar) – tribbloid Jul 18 '14 at 01:31
  • We build against a spark assembly uber-jar that gets effectively excluded with this rule. Otherwise you need to exclude all dependencies. – maasg Jul 18 '14 at 06:54
  • I understand, thank you so much for sharing it. If mvnrepository host this uber-jar this is really an effective solution, otherwise I need to find an alternative way, let me check that out – tribbloid Jul 18 '14 at 13:58
  • we build our own with our specific Hadoop version and host it privately on S3. You could do the same and host it locally on the dev machine m2 repo. – maasg Jul 18 '14 at 14:19
  • Thank you for your advice. Well this is an open source project that is very loosely organized, we can't create a public repo just for that. I've confirmed the uber jar is not in mvnrepository, looks like scope=provided has to be there for a while until maven-shade adopted the new feature – tribbloid Jul 18 '14 at 15:22
0

You can use the scope attribute (provided) for dependency.

This is much like compile, but indicates you expect the JDK or a container to provide the dependency at runtime. For example, when building a web application for the Java Enterprise Edition, you would set the dependency on the Servlet API and related Java EE APIs to scope provided because the web container provides those classes. This scope is only available on the compilation and test classpath, and is not transitive.

Ref : http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope

eg:

<dependency>
  <groupId>group-a</groupId>
  <artifactId>artifact-b</artifactId>
  <version>1.0</version>
  <type>bar</type>
  <scope>provided</scope>
</dependency>
Kapil Balagi
  • 1,074
  • 8
  • 3
  • Unfortunately no can do, it will exclude dependencies even in run and test time, in our case we just want to exclude in assembly time – tribbloid Jul 16 '14 at 20:05
  • I don't understand why this has been marked down, this is how I configure my Spark maven builds which supports Local and Unit testing as part of the build, and the packaged JAR for production deployment doesn't contain any Spark libraries – Brad Jan 25 '17 at 22:20
0

You should create 2 profiles. 1 for your idea with spark at compile scope (default), the other used during your build (with provided scope).

<profiles>
    <profile>
        <id>default-without-spark</id>
        <activation>
            <activeByDefault>true</activeByDefault>
        </activation>
        <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <scope>provided</scope>
            </dependency>
        </dependencies>
    </profile>
    <profile>
        <id>dev</id>
        <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
            </dependency>
        </dependencies>
    </profile>
</profiles>

You'll get what you want without the disadvantage of @maasg solution (all spark transitive dependencies added to your final jar)

Quentin
  • 3,150
  • 4
  • 24
  • 34