I could have asked how can I avoid
Avro is built-in but external data source module since Spark 2.4
I have been using the following approach to bootstrap my session in junit (this approach works for all my my other tests).
sparkSession = SparkSession.builder().appName("testings")
.master("local[2]")
.config("", "")
.getOrCreate()
But when I try to read an avro file into a DF
final Dataset<Row> df = sparkSession.read().format("avro").load(inputFilePath.toString());
I get the following exception:
org.apache.spark.sql.AnalysisException: Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;
In the spark documentation it mentions that for spark-submit or spark-shell we should the --package
option.
How do I include --package for junit test?
I have even included the maven dependency just to be safe
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.11</artifactId>
<version>${spark-core.version}</version>
</dependency>
I would like to add that the following code works in my unit-test:
final Dataset<Row> df = sparkSession.read().format("org.apache.spark.sql.avro.AvroFileFormat").load(inputFilePath.toString());