1

I have a snappy compressed parquet file in local directory /home/hive/part-00000-52d40ae4-92cd-414c-b4f7-bfa795ee65c8-c000.snappy.parque.

When an external hive table is created with the below command it gets executed but when select * from parquet_hive123456789 is run then no rows are returned.

CREATE EXTERNAL TABLE parquet_hive123456789 (
  `ip` string,
  `request` string,
  `status` string,
  `userid` string,
  `bytes` string,
  `agent` string,
  `timestamp` timestamp
) STORED AS PARQUET
LOCATION '/home/hive/';

Through parquet-tools i am able to see the contents in the file.

parquet-tools show part-00000-52d40ae4-92cd-414c-b4f7-bfa795ee65c8-c000.snappy.parquet

+-----------------+-------------------------------------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------+-------------+
| ip              | request                             |   status |   userid |   bytes | agent                                                                                                               | timestamp   |
|-----------------+-------------------------------------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------+-------------|
| 222.203.236.146 | GET /site/user_status.html HTTP/1.1 |      405 |       13 |   14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT         |
| 122.152.45.245  | GET /site/login.html HTTP/1.1       |      407 |        5 |     278 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT         |
| 222.152.45.45   | GET /site/user_status.html HTTP/1.1 |      302 |       22 |    4096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT         |
| 222.245.174.248 | GET /index.html HTTP/1.1            |      404 |        7 |   14096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)                                            | NaT         |
| 122.173.165.203 | GET /index.html HTTP/1.1            |      200 |       39 |     278 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT         |
| 122.168.57.222  | GET /images/logo-small.png HTTP/1.1 |      404 |        2 |   14096 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)                                            | NaT         |
| 122.152.45.245  | GET /images/track.png HTTP/1.1      |      405 |        5 |     278 | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)                                            | NaT         |
| 122.173.165.203 | GET /site/user_status.html HTTP/1.1 |      407 |       39 |   14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT         |
| 222.245.174.248 | GET /images/track.png HTTP/1.1      |      302 |        7 |     278 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT         |
| 122.173.165.203 | GET /site/user_status.html HTTP/1.1 |      200 |       39 |   14096 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36 | NaT         |
+-----------------+-------------------------------------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------+-------------+

Can somebody please help ?

leftjoin
  • 36,950
  • 8
  • 57
  • 116

1 Answers1

0

LOCATION should be HDFS directory, not local. Directory like '/home/hive' may exist in HDFS also, but this is bad idea to name table location like this. It should be table-specific name because all tables data should be in there own locations, separated from other tables. Usually table dir looks like this: /user/hadoop/mytable - where mytable is a table name.

Put your file into HDFS dir. For example like this (use your path in HDFS):

hdfs dfs -put  /home/hive/part-00000-52d40ae4-92cd-414c-b4f7-bfa795ee65c8-c000.snappy.parque /user/hadoop/table_dir/

Check file exists in HDFS (use your HDFS path):

hdfs dfs -ls '/user/hadoop/table_dir/'

Then create table (EXTERNAL or MANAGED, does not matter in this context) with location in HDFS: '/user/hadoop/table_dir/'

Alternatively you can create table, then load local file into it using LOAD DATA LOCAL INPATH command like in this answer.

leftjoin
  • 36,950
  • 8
  • 57
  • 116