Hive unable to fetch data from partition created by Spark

Question

Current Setup: 1. HDInsight for Spark 2. Azure Datalake Store for HDFS storage

I am trying to load data into a hive table from the partition located on Azure DLS for writing to the partiton location I am using this code -

from pyspark import SparkContext
from pyspark.sql.context import SQLContext
from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark.sql import HiveContext
sc = SparkContext(appName="Transformation").getOrCreate()
sql_context = SQLContext(sc)
sql_context.setConf("hive.exec.dynamic.partition", "true")
sql_context.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
sql_context.setConf("hive.exec.max.dynamic.partitions.pernode", "400")
sql_context.setConf("spark.sql.hive.convertMetastoreParquet", "false")

df = sql_context.read.orc("adl://LOCATION_NAME") # current data in ORC format stored on ADLS
df.write.format("parquet").partitionBy('ReportId', 'ReportDeliveryDate').save('adl://LOCATION_NAME') # writing the data

I can see the partition data at the location :

Now I am creating the table with the following command in Hive:

DROP TABLE IF EXISTS default.test_tb;
CREATE EXTERNAL TABLE IF NOT EXISTS default.test_tb
(
A string,
B string,
)
PARTITIONED BY (ReportId STRING, ReportDeliveryDate STRING)
STORED AS PARQUET
LOCATION 'adl://LOCATION_NAME';

and when I am making this query - select * from test_tb there is no output. I have tried many methods like one provided here. Not sure what I am doing wrong here.

----Update--- The problem here is different, I had a long discussion with Azure Team and the problem was HIVE doesn't support UpperCase partitions so if you create them in lowercase from spark and then run repair table command it worked

yeah but the answer there still doesn't solve the problems @KirkBroadhurst — Nikhil Parmar, Feb 06 '18 at 19:15
@NikhilParmar According to that question, this is a limitation of some versions of Spark. There is a workaround offered - did you try it? — Kirk Broadhurst, Feb 06 '18 at 19:18
yeah I tried it but now spark takes forever to dump data into partitions — Nikhil Parmar, Feb 06 '18 at 19:19
Another similar question. https://stackoverflow.com/questions/33551878/hive-doesnt-read-partitioned-parquet-files-generated-by-spark Suggest you try looking at similar questions on the site. — Kirk Broadhurst, Feb 06 '18 at 19:20
Hey @kirk i think I need to try this last thing here I will update you soon on this — Nikhil Parmar, Feb 06 '18 at 19:21
Hey Guys @eliasah here the problem was different so was the solution. Let me update the answer — Nikhil Parmar, Feb 15 '18 at 06:00

Hive unable to fetch data from partition created by Spark

0 Answers0