Can a single sqoop job be used for multiple tables and be running at the same time

Question

I just started with Sqoop Hands-on. I have a question, lets say I have 300 tables in a database and I want to perform an incremental load on those tables. I understand I can do incremental imports with either append mode or last modified.

But do I have to create 300 jobs, if the only thing in job which varies is Table name , CDC column and the last value/updated value?

Has anyone tried using the same job and passing this above things as parameter which can be read from a text file in a loop and execute the same job for all the tables in parallel.

What is the industry standard and recommendations ?

Also, is there a way to truncate and re-load the hadoop tables which is very small instead of performing CDC and merging the tables later?

score 0 · Answer 1 · edited May 23 '17 at 12:33

0

There is import-all-tables "Import tables from a database to HDFS" However it will not provide way to change CDC column for each table. Also see sqoop import multiple tables

There is no truncation but same can be achieved through following. --delete-target-dir "Delete the import target directory if it exists"

edited May 23 '17 at 12:33

Community

1
1

answered Aug 18 '16 at 16:45

Sagar Shah

118
4

Can a single sqoop job be used for multiple tables and be running at the same time

1 Answers1