49

I am trying to run a spark program where i have multiple jar files, if I had only one jar I am not able run. I want to add both the jar files which are in same location. I have tried the below but it shows a dependency error

spark-submit \
  --class "max" maxjar.jar Book1.csv test \
  --driver-class-path /usr/lib/spark/assembly/lib/hive-common-0.13.1-cdh​5.3.0.jar

How can i add another jar file which is in the same directory?

I want add /usr/lib/spark/assembly/lib/hive-serde.jar.

Ratan Sebastian
  • 1,892
  • 14
  • 23
Avinash Nishanth S
  • 514
  • 1
  • 5
  • 15
  • Welcome @avinash, for you next post I recommand yout to have a look to http://stackoverflow.com/editing-help – AdrieanKhisbe Mar 17 '15 at 12:32
  • `spark-submit [restofyouroptions] --conf "spark.driver.extraClassPath=myjarfile.jar"` – Zahra Sep 19 '17 at 18:49
  • 2
    multiple jar files: `"spark.driver.extraClassPath=/path/myjarfile1.jar:/path/myjarfile2.jar"` – Zahra Sep 19 '17 at 18:56
  • @zahra didn't work for me, 'No suitable driver' found error. This problem is due to the JVM is already started before setting the 'extraClassPath' conf.. Is there any way that we can set it before the JVM starts? – Jai K Apr 20 '21 at 05:40

9 Answers9

52

Just use the --jars parameter. Spark will share those jars (comma-separated) with the executors.

Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
pzecevic
  • 2,807
  • 22
  • 21
  • 3
    i tried comma separated spark-submit --class "max" maxjar.jar Book1.csv test /usr/lib/spark/assembly/lib/hive-common-0.13.1-cdh5.3.0.jar,hive-serde.jar . but it doesnt read neither of the jars . I get this error the org/apache/hadoop/hive/conf/HiveConf – Avinash Nishanth S Mar 18 '15 at 03:47
  • 4
    I meant, use it like this: spark-submit --master master_url --jars jar1,jar2 --class classname application_jar – pzecevic Mar 18 '15 at 07:20
  • actually i want add multiple jars in my classpath .i dont have access to copy thejars in my locale file so i am just accesing the jars through class path – Avinash Nishanth S Mar 18 '15 at 09:26
  • I tried it also, but it doesn't work, spark took into account just the 1st jar, the second consider it as the job jar, thus, it throws an exception telling that the class specified with --class is not found – aName Oct 18 '19 at 14:49
41

Specifying full path for all additional jars works.

./bin/spark-submit --class "SparkTest" --master local[*] --jars /fullpath/first.jar,/fullpath/second.jar /fullpath/your-program.jar

Or add jars in conf/spark-defaults.conf by adding lines like:

spark.driver.extraClassPath /fullpath/firs.jar:/fullpath/second.jar
spark.executor.extraClassPath /fullpath/firs.jar:/fullpath/second.jar
Ghasem
  • 14,455
  • 21
  • 138
  • 171
user3688187
  • 411
  • 4
  • 3
  • 1
    How do I do it in windows? Because on windows path includes colon e.g. D:\path – user812142 Feb 27 '20 at 04:54
  • a comma separated list of packages helped me.. Create a spark-defaults.conf file within the bin folder of spark folder. In the spark-defaults.conf type "spark.jars.packages org.apache.spark:spark-streaming-kafka-0-10_2.12:3.0.2,org.apache.spark:spark-avro_2.12:3.0.2" As you see, i am getting the 1st package "streaming kafka" and 2nd package "spark avro".. All you have to do is add as much packages as needed by specifying them with a comma separator. – Induraj PR Mar 03 '21 at 04:44
23

You can use * for import all jars into a folder when adding in conf/spark-defaults.conf .

spark.driver.extraClassPath /fullpath/*
spark.executor.extraClassPath /fullpath/*
Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
  • Are you sure? I got "16/10/20 19:56:43 ERROR SparkContext: Jar not found at file:/root/.ivy2/jars/*.jar" – Thomas Decaux Oct 20 '16 at 17:44
  • 1
    Relative path works too! My setting is "spark.driver.extraClassPath lib/*" where lib is a directory under spark home and all 3rd party jars are there. – Leon Jul 13 '17 at 08:44
  • This solution works! I had similar issue where i needed two different jdbc drivers for multiple DB connection scenario and this approach works a charm! Thank you. – Prajwal Ainapur Oct 17 '22 at 07:32
6

I was trying to connect to mysql from the python code that was executed using spark-submit.

I was using HDP sandbox that was using Ambari. Tried lot of options such as --jars, --driver-class-path, etc, but none worked.

Solution

Copy the jar in /usr/local/miniconda/lib/python2.7/site-packages/pyspark/jars/

As of now I'm not sure if it's a solution or a quick hack, but since I'm working on POC so it kind of works for me.

Community
  • 1
  • 1
Ayush Vatsyayan
  • 2,498
  • 21
  • 27
  • 1
    https://meta.stackoverflow.com/q/288160/1434041 though it makes more sense to remove those as part of a larger edit. – Zahra Sep 20 '17 at 14:29
  • 1
    Just for reference since it was one of the first questions I've found when searching this in Google, in **AWS EMR** with **Spark 2.x** the jars folder is in `/usr/lib/spark/jars/`. There's an [official tutorial](https://aws.amazon.com/pt/premiumsupport/knowledge-center/emr-permanently-install-library/) from AWS on how to do that. – Daniel Lavedonio de Lima May 15 '21 at 07:44
6

In Spark 2.3 you need to just set the --jars option. The file path should be prepended with the scheme though ie file:///<absolute path to the jars> Eg : file:////home/hadoop/spark/externaljsrs/* or file:////home/hadoop/spark/externaljars/abc.jar,file:////home/hadoop/spark/externaljars/def.jar

Binita Bharati
  • 5,239
  • 1
  • 43
  • 24
5

Pass --jars with the path of jar files separated by , to spark-submit.

For reference:

--driver-class-path is used to mention "extra" jars to add to the "driver" of the spark job

--driver-library-path is used to "change" the default library path for the jars needed for the spark driver

--driver-class-path will only push the jars to the driver machine. If you want to send the jars to "executors", you need to use --jars

And to set the jars programatically set the following config: spark.yarn.dist.jars with comma-separated list of jars.

Eg:

from pyspark.sql import SparkSession

spark = SparkSession \
        .builder \
        .appName("Spark config example") \
        .config("spark.yarn.dist.jars", "<path-to-jar/test1.jar>,<path-to-jar/test2.jar>") \
        .getOrCreate()
Nandeesh
  • 2,683
  • 2
  • 30
  • 42
3

You can use --jars $(echo /Path/To/Your/Jars/*.jar | tr ' ' ',') to include entire folder of Jars. So, spark-submit -- class com.yourClass \ --jars $(echo /Path/To/Your/Jars/*.jar | tr ' ' ',') \ ...

NiharGht
  • 151
  • 5
  • 10
0

For --driver-class-path option you can use : as delimeter to pass multiple jars. Below is the example with spark-shell command but I guess the same should work with spark-submit as well

    spark-shell --driver-class-path /path/to/example.jar:/path/to/another.jar

Spark version: 2.2.0

user2720864
  • 8,015
  • 5
  • 48
  • 60
0

if you are using properties file you can add following line there:

spark.jars=jars/your_jar1.jar,...

assuming that

<your root from where you run spark-submit>
  |
  |-jars
      |-your_jar1.jar
wiesiu_p
  • 558
  • 7
  • 6