0

I am trying to update a hive table, based on the records from a mysql table.

mysql-table: (table name: delimiter_test)

+---------------+-----------------+
| department_id | department_name |
+---------------+-----------------+
|             2 | Fitness         |
|             3 | Footwear        |
|             4 | Apparel         |
|             5 | Golf            |
|             6 | Outdoors        |
|             7 | Fan Shop        |
|             8 | Test            |
+---------------+-----------------+

hive-table (table name: my_test)

2   Fitness
3   Footwear
4   Apparel
5   Golf
6   Outdoors
7   Fan Shop

I am trying to use sqoop, to import the last record in the mysql table with department_id 8, into hive table using incremental-update in sqoop.

my-sqoop command:

sqoop import --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" --username xxx --password xxx --table delimiter_test  --hive-import  --hive-table my_test  --split-by department_id  --check-column department_id --incremental append --last-value 7

I am not getting any errors,but the extra record from the mysql table with department_id 8 is not getting updated into the hive table.

Please suggest me where am I going wrong.

Dev
  • 13,492
  • 19
  • 81
  • 174
data_person
  • 4,194
  • 7
  • 40
  • 75
  • Command seems good to me. Can you add `--verbose` in the end of your swoop command to see extended logs and see if you find any errors? – Dev Dec 28 '16 at 02:04
  • @devツ I noticed something wierd, when I added the number of mapers to be 10 (--m 10), the extra record with department_id 8, was inserted twice in the hive table, and with --m 6, the record got inserted correctly. – data_person Dec 28 '16 at 02:14
  • Is department _id string in hive table ? Share full logs after `--verbose` – Dev Dec 28 '16 at 02:27
  • Also see this: http://stackoverflow.com/a/37389134/2700344 – leftjoin Dec 28 '16 at 14:15

2 Answers2

0

I dont know if probably we are working on itversity labs. Well I have done this thing using the code below. probably that could work for you too.

First load data in hive

sqoop import --connect jdbc:mysql://xxxxx/retail_db --username xxxx --password xxxx \
--table departments --where department_id=2 --hive-import --hive-database poc --hive-table departments_sqoop  \
--target-dir /user/ingenieroandresangel/sqoop/dep_hive --split-by department_id -m 1

Then I perform the update with the script below:

sqoop import --connect jdbc:mysql://xxxxxx/retail_db --username xxxxx --password xxxx \
--table departments --where 'department_id>=2' --hive-import --hive-database poc --hive-table departments_sqoop  --incremental append \
--check-column department_id --last-value 2 --target-dir /user/ingenieroandresangel/sqoop/dep_hive --split-by department_id -m 1
Andres Urrego Angel
  • 1,842
  • 7
  • 29
  • 55
0

sqoop import --connect jdbc:mysql://xxxxxx/mysql_db --username xxxxx --password xxxx
--table departments --target-dir /sqoop/dep_hive --hive-import --hive-database poc --hive-table departments_sqoop --incremental lastmodified
--check-column department_id --last-value 2 -m 1

#when appending data directly to the hive table use tag --incremental lastmodified (If appending to hdfs FILE you can use --incremental-append)

#to overwrite hive table you can add tag --hive-overwrite

#last-value will be excluded in file import.