Sqoop import having SQL query with where clause

Question

sqoop import --connect jdbc:teradata://192.168.xx.xx/DBS_PORT=1025,DATABASE=ds_tbl_db 
--driver com.teradata.jdbc.TeraDriver 
--username dbc 
--password dbc 
--query 'select * from reason where id>20' 
--hive-import 
--hive-table reason_hive 
--target-dir <hdfs-location> 
-m 1

I got the error:

Query [select * from reason where id>20] must contain '$CONDITIONS' in WHERE clause.

I know there must be a where clause in query for Sqoop.

So, for queries like

select * from reason

I modify it to:

select * from reason WHERE $CONDITIONS

What to do for queries having where clause?

score 34 · Accepted Answer · edited Feb 29 '16 at 06:46

34

You have to add AND \$CONDITIONS

--query "select * from reason where id>20 AND \$CONDITIONS"

Please refer Sqoop documentation .

edited Feb 29 '16 at 06:46

Dev

13,492
19
81
174

answered Feb 29 '16 at 05:29

vinayak_narune

737
1
10
26

6

`AND \$CONDITIONS` is need if you have enclosed your query in double quotes. otherwise use `AND $CONDITIONS` – Dev Feb 29 '16 at 07:12

Nikhil Bhide · Answer 2 · 2016-12-30T10:00:05.290

Sqoop requires to access metadata of table for example column type information. Placeholder $CONDITIONS is by default set to '1 = 0' to ensure that sqoop receives only type information. So, after executing sqoop command you will see first query that gets fired is with default $CONDITIONS. Later on, it is substituted by different values defining different ranges based on number of mappers (-m) or --split-by column or --boundary-query so that entire data set can be divided into different data slices or chunks and chunks can be imported in parallel with as much as concurrency available. Sqoop will automatically substitute this placeholder with the generated conditions specifying which slice of data should be transferred by each individual task

For example, consider sample_data table with columns name, id and salary. You want to fetch records with salary > 1k.

 sqoop import \ 
    --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
    --username retail_dba --password cloudera \
    --query 'select * from sample_data where $CONDITIONS AND salary > 1000' \
    --split-by salary \
    --target-dir hdfs://quickstart.cloudera/user/cloudera/sqoop_new

Following is first query which returns empty set.

SqlManager: Executing SQL statement: select * from sample_data where  (1 = 0)  AND salary > 1000

Then next query is to get min and max of range.

INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(salary), MAX(salary) FROM (select * from sample_data where  (1 = 1)  AND salary > 1000) AS t1;

score 2 · Answer 3 · answered Oct 25 '17 at 16:50

2

You can use Where Clause

--where "order_status = 'CLOSED'"

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

answered Oct 25 '17 at 16:50

RajenDharmendra

21
3

score 1 · Answer 4 · answered Sep 03 '20 at 06:52

1

You can set as like below

If database is oracle then select * from (query) where $CONDITIONS else select * from (query) as T where $CONDITIONS

Please let me know if in case any issue.

answered Sep 03 '20 at 06:52

Balkrushna Patil

418
6
12

score 0 · Answer 5 · edited Dec 07 '18 at 15:55

0

I am working in cloudera with querying list of tables in Mysql.
I got results with below listed query:

sqoop import --connect jdbc:mysql://127.0.0.1/Mydatabase
    --username root --password cloudera 
    --query 'select * from employee where $CONDITIONS AND Sal<250000' 
    --split-by Sal --target-dir=user/cloudera/myfirstdata2 -m 1

edited Dec 07 '18 at 15:55

Andrey Nikolov

12,967
3
20
32

answered Dec 07 '18 at 11:17

user3098458

1

score 0 · Answer 6 · answered May 25 '20 at 19:58

when you are using free form query, just make sure you are place '\' before $CONDITIONS and you are good to do. Here is an example I have tried on Cloudera Quickstart VM and it worked fine.

sqoop import --connect jdbc:mysql://quickstart.cloudera/retail_db --username retail_dba --password cloudera --query "select customers.customer_fname, customers.customer_lname, orders.order_id, orders.order_date from customers join orders on (customers.customer_id = orders.order_customer_id) WHERE \$CONDITIONS" --as-parquetfile --split-by "orders.order_customer_id" --target-dir "/user/cloudera/problem2/data/cusotmers"

Please refer to this link for detailed documentation: https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#_free_form_query_imports

Sqoop import having SQL query with where clause

6 Answers6

Linked