2

I already have a SparkContext created and a Spark global variable. When I read ORC files, I can read them as simple as spark.read.format("orc").load("filepath") however, for avro I can't seem to do the same even though I try to import the jar like so:

    spark.conf.set("spark.jars.packages",
    "file:///projects/apps/lib/spark-avro_2.11-3.2.0.jar")

Error:

and then try to read the avro file. I get an error like so: 
Py4JJavaError: An error occurred while calling o65.load.
: org.apache.spark.sql.AnalysisException: Failed to find data source: avro. Please find an Avro package at http://spark.apache.org/third-party-projects.html;
user2896120
  • 3,180
  • 4
  • 40
  • 100

1 Answers1

2

spark.jars.packages takes Gradle compatible coordinates:

spark.jars.packages  org.apache.spark:spark-avro_2.12:2.4.2

Additionally, as explained in How to load jar dependenices in IPython Notebook, it has to be set before JVM and SparkSession / SparkContext are initialized.

So you have to:

  • Fix the settings.
  • Provide these as a configuration or environment variable, before JVM is initialized.
  • Is it possible to destroy the current spark context that was initialized and set the spark context with the jar within jupyter? – user2896120 Apr 25 '19 at 15:12