1

I am trying to import data from MYSQL to Hive table using incremental import. But my table has no primary key. I had created a sqoop job.

Sample MYSQL table Data

| Oct-14 |   581 |
| Nov-14 |   519 |
| Dec-14 |   605 |
| Jan-15 |   484 |
| Feb-15 |   584 |
| Mar-15 |   684 |
 Mar-15    684 

First column is of String datatype and second is INT.

My Sqoop Job is

sqoop job \
--create test13 \
-- import \
--append \
--connect jdbc:mysql://localhost/tractor_sales --username root --P \
--query 'SELECT t.*,@rownum := @rownum + 1 AS rank FROM tractor_sales t,(SELECT @rownum := 0) r where $CONDITIONS' \
--split-by year \
--merge-key @rownum \
--check-column @rownum \
--hive-database salesforecast --hive-table tractor_sales \
--incremental append  \
--last-value 0 \
--hive-import \
--target-dir /user/cloudera/test;

when trying to execute the job I am getting the following error

'Every derived table must have its own alias'

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
Rajesh
  • 35
  • 1
  • 8
  • Could you please post the mysql table schema name as well ? – Sandeep Singh Jul 16 '17 at 12:29
  • @SandeepSingh Feilds: year,sales DataType : year (Varchar(10)) sales (INT(11)) – Rajesh Jul 19 '17 at 07:00
  • `split-by` can only be perform with integer column. Please change it with sales. – Sandeep Singh Jul 19 '17 at 10:59
  • Sometimes I have faced the count issue with rownum so validate the result carefully. – Sandeep Singh Jul 19 '17 at 11:00
  • @SandeepSingh Thanks for your input.The values in sales column can be repetitive, i mean to say i can get same values for two or more months. Does using the --split-by for sales still suffice my requirement. – Rajesh Jul 20 '17 at 06:08
  • Yes, there wont be any issue. – Sandeep Singh Jul 20 '17 at 06:11
  • @SandeepSingh Here the error is related to the query i had used. if i had removed the --query and and used --table instead. my --merge-key and and --check-column will be changed to sales. when i had created a sqoop job and executed, it is creating boundary values. For example in the existing data Maximum value of sales is 800. it is storing the value is metastore, if new rows are inserted with 700,850,900, it is taking the values of 850 and 900 ignoring 700. – Rajesh Jul 20 '17 at 06:20
  • it should not ignore, however I recommend to go with one mapper for incremental import. – Sandeep Singh Jul 20 '17 at 06:32

0 Answers0