We are facing problem while reading avro file in spark2-shell in Spark2.4 Any pointers will be of great help.
We were using following method to read avro files in spark2.3, but this support has been removed in Spark2.4:
spark2-shell --jars /tmp/spark/spark-avro_2.11-4.0.0.jar
import org.apache.avro.Schema
spark.sqlContext.sparkContext.hadoopConfiguration.set("avro.mapred.ignore.inputs.without.extension", "true")
val df = spark.read.format("com.databricks.spark.avro").option("header", "true").option("mode", "DROPMALFORMED").load("<DIR_PATH_FOR_AVRO>")
- Spark 2.4 documentation provides following details:
(https://spark.apache.org/docs/latest/sql-data-sources-avro.html)
./bin/spark-shell --packages org.apache.spark:spark-avro_2.12:2.4.4
But we get following exception while using this approach:
Exception in thread "main" java.lang.RuntimeException:
[unresolved dependency: org.apache.spark#spark-avro_2.12;2.4.4: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1306)
at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:315)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Have also tried:
spark2-shell --packages org.apache.spark:spark-avro_2.12:2.4.4 --jars /tmp/spark/spark-avro_2.12-2.4.0.jar