0

Apache Drill has a nice feature of making parquet files out of many incoming datasets, but it seems like there is not a lot of information on how to use those parquet files later on - specifically in Hive.

Is there a way for Hive to make use of those "1_0_0.parquet", etc files? Maybe create a table and load the data from parquet files or create a table and somehow place those parquet files inside hdfs so that Hive reads it?

Pavel
  • 964
  • 1
  • 6
  • 18
  • Possible duplicate of [Dynamically create Hive external table with Avro schema on Parquet Data](http://stackoverflow.com/questions/34181844/dynamically-create-hive-external-table-with-avro-schema-on-parquet-data) – Ani Menon Jan 13 '17 at 04:19
  • Unfortunately Apache Drill does not create Avro schema, are you suggesting that I manually create one? – Pavel Jan 13 '17 at 04:33
  • Yes.. Refer http://kitesdk.org/docs/0.17.1/labs/4-using-parquet-tools-solution.html – Ani Menon Jan 13 '17 at 04:45

1 Answers1

1

I have faced this problem, if you are using a Cloudera distribution, you can create the tables using impala (Impala and Hive share the metastore), it allows create tables from a parquet file. Unfortunately Hive doesn't allow this

CREATE EXTERNAL TABLE table_from_fileLIKE PARQUET     '/user/etl/destination/datafile1.parquet'
STORED AS PARQUET
LOCATION '/user/test/destination';
hlagos
  • 7,690
  • 3
  • 23
  • 41