2

I could have asked how can I avoid

Avro is built-in but external data source module since Spark 2.4

I have been using the following approach to bootstrap my session in junit (this approach works for all my my other tests).

sparkSession = SparkSession.builder().appName("testings")
      .master("local[2]")
      .config("", "")
      .getOrCreate()

But when I try to read an avro file into a DF

final Dataset<Row> df = sparkSession.read().format("avro").load(inputFilePath.toString());

I get the following exception:

org.apache.spark.sql.AnalysisException: Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;

In the spark documentation it mentions that for spark-submit or spark-shell we should the --package option.

How do I include --package for junit test?

I have even included the maven dependency just to be safe

 <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-avro_2.11</artifactId>
            <version>${spark-core.version}</version>
</dependency>

I would like to add that the following code works in my unit-test:

final Dataset<Row> df = sparkSession.read().format("org.apache.spark.sql.avro.AvroFileFormat").load(inputFilePath.toString());
hba
  • 7,406
  • 10
  • 63
  • 105

0 Answers0