How do I use Sqoop for importing data from a relational DB to sandbox Hive?

Question

How do I use Sqoop for importing data from a relational DB to Hive in Sandbox. I have Installed Horton Works sandbox in MY PC. Nw i want to know this migration.I have referd this link http://hortonworks.com/kb/using-apache-sqoop-for-data-import-from-relational-dbs/ But i am confused with some doubt 1, Is it need any Sqoop software for running? 2, In above mensioned link some code is there where i will put this code? In Hive query window? 3, Is it possible to migrate a database Fully(or at a time table only)? 4, What about Stored procedure and Viwes where i will kept all these?

score 6 · Answer 1 · edited May 23 '17 at 11:44

Sqoop is preinstalled with latest Horton Sandbox. You have to do the following things. If some condition you may require permissions (Host 'xxx.xx.xxx.xxx' is not allowed to connect to this MySQL server). Otherwise do the following - TO IMPORT & EXPORT DATA FROM RDBMS (MYSQL,ORACLE, etc) INTO HDFS / HIVE / HBASE

Pre-requisite

Apache Hadoop
Apache Sqoop (compatible with Hadoop version)
Apache Hive (optional)
Apache HBase (optional)
Apache HCatalog (optional)
JDBC/ODBC connector

For all RDBMS, Connection URL changes and remaining all command line arguments remains same. You need to download specific JDBC/ODBC connector JAR and copy it to $SQOOP_HOME/lib

MySQL

Download mysql-connector-java.jar and place in $SQOOP_HOME/lib folder

cp mysql-connector-java-5.1.18-bin.jar /usr/local/hadoop/sqoop-1.4.3-cdh4.4.0/lib/

Expecting you have data in MySQL tables.

Retrieving list of Databases available in MySQL from SQOOP

sqoop list-databases --connect jdbc:mysql://localhost:3306/  --username root -P

MySQL to HDFS Import

Have Primary key:

sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /path/to/directoryName

No Primary key:

sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /path/to/directoryName  -m 1

MySQL to Hive Import

Have Primary key:

sqoop-import  --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName  --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home

No Primary key:

sqoop-import  --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName  --hive-table tableName --create-hive-table --hive-import --hive-home  path/to/hive_home -m 1

MySQL to HBase Import

Have Import All columns:

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName  --column-family hbase_table_col1 --hbase-create-table

HBase import few columns

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table

To HBase with Primary key:

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 –hbase-create-table

To Hbase with no primary key:

sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col --hbase-row-key column1 --hbase-create-table

Export from HDFS to MySQL:

Same for all Hive/HBase/HDFS: Because Hive tables are nothing but directories in HDFS. So you're just exporting a directory to MySQL

sqoop export --connect jdbc:mysql://localhost:3306/test_db --table tableName  --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\001'

SQL Server

Connection URL:

sqoop import --connect 'jdbc:sqlserver://;username=dbuser;password=dbpasswd;database=' --table --target-dir /path/to/hdfs/dir --split-by -m 1

Download Connector from Microsoft website

http://www.microsoft.com/en-us/download/confirmation.aspx?id=11774

Place it in $SQOOP_HOME/lib

Oracle

Connection URL:

sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=myhost)(port=1521))(connect_data=(service_name=myservice)))" \
--username USER --table SCHEMA.TABLE_NAME --hive-import --hive-table SCHEMA.TABLE_NAME \
--num-mappers 1 --verbose -P \

IBM DB2

Download the DB2Driver and place it in $SQOOP_HOME/lib

    sqoop import --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2.my.com:50000/testdb --username db2user --db2pwd --table db2tbl --split-by tbl_primarykey --target-dir sqoopimports

    sqoop export --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2.my.com:50000/myDB --username db2user --password db2pwd --table db2tbl --export-dir /sqoop/dataFile.csv

Different Connection Strings for Different RDBMS

Database version --direct support? connect string matches
HSQLDB 1.8.0+ No jdbc:hsqldb:*//
MySQL 5.0+ Yes jdbc:mysql://
Oracle 10.2.0+ No jdbc:oracle:*//
PostgreSQL 8.3+ Yes (import only) jdbc:postgresql://

score 2 · Accepted Answer · edited Aug 27 '19 at 10:39

1- You will need Microsoft SQL Server Connector for Apache Hadoop to do that which can be downloaded from here.

2- You have to issue the import command from the Sqoop shell. I would suggest you to go through the Sqoop document once, specially section 7.2.10 which tell us about importing data into Hive.

3- You could:

import all tables from a database, using import-all-tables
import one table, using sqoop-import
import part of a table (specific rows or specific columns), using, respectively, --where or --columns arguments
export data from HDFS to a relational DB, using sqoop-export.

4- And once your data in inside the Hadoop cluster you can no more use your SQL procedures. Stored procedure languages typically do more of their logic with result sets. Because hive result sets are typically large the aggregations and transformations are done either with functions or streaming.You need to think about some alternative. But you can have views in Hive. To find more on views in Hive you can go here.

HTH

How do I use Sqoop for importing data from a relational DB to sandbox Hive?

2 Answers2