5

The page here (http://spark.apache.org/docs/latest/programming-guide.html) indicates packages can be included when the shell is launched via:

$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.4.0

What is the syntax for including local packages (that are downloaded manually say)? Something to do with Maven coords?

mathtick
  • 6,487
  • 13
  • 56
  • 101

2 Answers2

4

If the jars are present on the master/workers, you simply need to specify them on the classpath in spark-submit:

spark-shell \
spark.driver.extraClassPath="/path/to/jar/spark-csv_2.11.jar" \
spark.executor.extraClassPath="spark-csv_2.11.jar"

If the jars are only present in the Master, and you want them to be sent to the worker (only works for client mode), you can add the --jars flag:

spark-shell \
spark.driver.extraClassPath="/path/to/jar/spark-csv_2.11.jar" \
spark.executor.extraClassPath="spark-csv_2.11.jar" \
--jars "/path/to/jar/jary.jar:/path/to/other/other.jar"

For a more elaborated answer see Add jars to a Spark Job - spark-submit

Community
  • 1
  • 1
Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
2

Please use:

./spark-shell --jars my_jars_to_be_included

There is a open question related to this: Please check this question out.

Community
  • 1
  • 1
dbustosp
  • 4,208
  • 25
  • 46