1

Is there any way to specify complete folder path of the jars to be pushed on driver as well as executor like --jars in spark-submit, which excepts comma separated jar names with full path. But it's tedious work if we do have too many jars to be pushed on both driver as well as executor.

user3190018
  • 890
  • 13
  • 26
Nitin Zadage
  • 633
  • 1
  • 9
  • 27

2 Answers2

0

Question : Is there a way to specify to push complete jar folder on both driver and executors?

Yes you can make uber jar which is self contained distribution with all depedencies packed inside.

sample if you are using maven, you can use maven shade plugin or assembly plugin for this. below is shade example.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.maventest</groupId>
    <artifactId>mytest</artifactId>
    <packaging>jar</packaging>
    <version>1.0-SNAPSHOT</version>
    <name>mytest</name>
    <url>http://maven.apache.org</url>
    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>2.3</version>
            <scope>compile</scope>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <finalName>uber-${artifactId}-${version}</finalName>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

If you are using sbt see this

your spark submit will look like ....

spark-submit [PATH_TO_YOUR_UBER_JAR]/[YOUR_UBER_JAR].jar

Further reading for example Googles article : Managing Java dependencies for Apache Spark applications

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
0

Running spark on yarn you have to be able to set spark.yarn.archive or spark.yarn.jars in spark-defaults.conf configuration file.

spark.yarn.archive is intended for distribution of the archive with all the jars you need on your executors.

spark.yarn.jars is for separate jars.

You may find more information in the official docs.