Disclaimer: I am new to HIVE and not duplicate of Create Hive table to read parquet files from parquet/avro schema (already tried the solution)
I have a spark job that is continuously writing to hdfs in parquet format which I am trying to load to Hive so that I would be able to query easily (my expectation).
I am saving files as PARQUET files in hdfs://X.X.X.X.5430/home/hduser/spark/testLogs/.
So when I to load HIVE table with those parquet files, I am not being able to load it. I am creating a external HIVE table with following command but when I query it there is no data.
"CREATE EXTERNAL TABLE IF NOT EXISTS log ( ipAddress STRING," +
"logLevel STRING," +
"userID STRING," +
"dateTimeString STRING," +
"method STRING," +
"endpoint STRING, " +
"protocol STRING," +
"responseCode INT," +
"content STRING," +
"trackingId STRING" +
") STORED AS PARQUET LOCATION 'hdfs://X.X.X.X:54310/home/hduser/spark/testlog/'");
Also when I try to load a file to table manually, I am getting following error
load data inpath "hdfs://X.X.X.X:54310/home/hduser/spark/testlog/part-r-00000-29ad05a5-ca12-4332-afd0-39eb337a1acd.parquet" into table log;
executed query both with local and without Error
FAILED: SemanticException Line 1:17 Invalid path .... No files matching path
Anybody ever came across situation like this.. Am I missing something? .... please point me to right direction... Any suggestion is welcomed...
PS: I cannot load any type of file format. CSV or TXT in respective type of table
Also, if anyone knows how to populate streamingRDD data from spark to Hive... please tell me how to do so.