I have a set of Avro based hive tables and I need to read data from them. As Spark-SQL uses hive serdes to read the data from HDFS, it is much slower than reading HDFS directly. So I have used data bricks Spark-Avro jar to read the Avro files from underlying HDFS dir.
Everything works fine except when the table is empty. I have managed to get the schema from the .avsc file of hive table using the following command but I am getting an error "No Avro files found"
val schemaFile = FileSystem.get(sc.hadoopConfiguration).open(new Path("hdfs://myfile.avsc"));
val schema = new Schema.Parser().parse(schemaFile);
spark.read.format("com.databricks.spark.avro").option("avroSchema", schema.toString).load("/tmp/myoutput.avro").show()
Workarounds:
I have placed an empty file in that directory and the same thing works fine.
Are there any other ways to achieve the same? like conf setting or something?