I'm trying to import data from Oracle database to HDFS by using sqoop 1.4.6
. I have no problem doing simple imports to HDFS or hive tables by using --table table_name
or --query SELECT <...>
statements.
However, my question: is there a way to import the table with executing a certain PL/SQL procedure first?
For example, assume there exists a procedure set_date(date IN DATE)
, specified in the Oracle database that is used to set the report date for the data. With usual JDBC tools (i.e., ROracle
package if working from R
) two queries are required for the same connection to pull the data:
1) BEGIN set_date(#some_date#); END;
2) SELECT * FROM table_name;
Is it possible to achieve the same result using sqoop
? I've tried mixing the two queries into one by the following approach:
sqoop import --connect jdbc:oracle:thin@LINK:PORT:SID \
--username user -P \
--target-dir /some/directory \
--query "BEGIN set_date(#some_date#); END; SELECT * FROM table_name"
But apparently the whole query is assumed to be in PL/SQL format, therefore the 2) part of the query is not formulated as expected. Also, I've tried eval
'ing only the set_date
part of the query, which seems to work fine, so the only problem is to correctly execute the 2) SQL query.
Is it possible to call a second query using the same sqoop connection? Is it possible to mix 1) and 2) statements into one in order to get the same result (for now a simple select table would be enough) using sqoop?
Note: I don't have much experience working with SQL and PL/SQL, so sorry if I didn't specify some parts clearly enough.