3

I am using pyspark from a notebook and I do not handle the creation of the SparkSession. I need to load a jar containing some functions I would like to use while processing my rdds. This is something which you can easily do using --jars which I cannot do in my particular case. Is there a way to access the spark scala context and call the addJar method? I tried to use the JavaGateway (sparksession._jvm...) but have not been successful so far. Any idea?

Thanks Guillaume

tog
  • 887
  • 1
  • 12
  • 22

3 Answers3

3

sparksession._jsc.addJar does the job.

tog
  • 887
  • 1
  • 12
  • 22
  • ... but it solves my problem only partially as the method is node available on my driver node! – tog Mar 24 '17 at 05:05
  • I found this very useful [post] (http://stackoverflow.com/questions/37132559/add-jars-to-a-spark-job-spark-submit) – tog Mar 24 '17 at 05:07
2

You can try this method, which will add the file to the context of all nodes:

spark.sparkContext.addFile("filename")
dirceusemighini
  • 1,344
  • 2
  • 16
  • 35
1

distribute xxx.jar by addJar and import it by extraClassPath

spark = SparkSession.builder.config('spark.driver.extraClassPath', 'xxx.jar').getOrCreate()
spark.sparkContext._jsc.addJar('/xxx/xxx/xxx.jar')