I have imported the data from MySQL to Hive using Sqoop command.
sqoop import --connect jdbc:mysql://mysql_server/db1 --username userid --password passwd --query 'select col1, col2, sum(col3), sum(col4), sum(col5), sum(col6) from table1 WHERE $CONDITIONS group by col1, col2' --split-by col1 --hive-import --hive-home /root/dev/hive-0.11.0 --create-hive-table --hive-table hive_table1 --target-dir imported_data
. Data gets imported successfully and I was able to query the imported table as well. But next day when I connected to hive and did show tables, it did not return anything. There is only one database in Hive, so no chance of any confusion there.
My concern is by any chance, Internal tables gets deleted once I disconnect from Hive? Is there anyway by which imported data should reside only in Hive, no where on HDFS?

- 353
- 3
- 5
- 16
2 Answers
I finally resolved the issue. It would involve two steps.
- Create an external hive table.
- Import data using Sqoop.
Creation of External table : External tables in hive are kind of permanent tables and stays there even if hive is stopped or server goes down. "EXTERNAL" keyword is used to specify table type.
CREATE EXTERNAL TABLE IF NOT EXISTS HIVEDB.HIVE_TABLE1 (DATE_COL DATE,
BIG_INT_COL BIGINT, INT_COL INT, VARCHAR_COL VARCHAR(221), FLOAT_COL FLOAT);
Import the data using Sqoop : Specify the created table name while importing the data, instead of using "--hive-create" option.
sqoop import --connect jdbc:mysql://mysqlhost/mysqldb --username user --password
passwd --query "SELECT table1.date_col, table1.big_int_col, table1.int_col,
table1.varchar_col, table1.float_col FROM MYSQL_TABLE1 AS table1 WHERE
\$CONDITIONS" --split-by table1.date_col --hive-import
--hive-table hivedb.hive_table1 --target-dir hive_table1_data`
Data was stored permanently in Hive.

- 353
- 3
- 5
- 16
Just some clarifications here:
- The tables created on hive, always gets stored on HDFS internally. What is the point of using Hive instead of traditional RDBMS otherwise.
- EXTERNAL keyword has a very different meaning. It means the tables's data is not stored under the purview of Hive. IT is useful, if the table you want to create is already present in HDFS. So instead of creating a copy of table, you can specify a HDFS path, which acts as data source for the table.
Also assuming you have default settings for hive, it is quite common to see tables getting disappeared. It is generally caused because the derby database in default configuration stored the tables metadata in the directory from which hive was called. Here is a simple solution: Hive is not showing tables
A better solution is https://cwiki.apache.org/confluence/display/Hive/HiveDerbyServerMode

- 1
- 1

- 359
- 1
- 4
- 17