Yes, this is possible. I'll admit the documentation on how to use the operators is lacking but if you understand the concept of hooks and operators in Airflow, you can figure it out by reading the code of the operator you're looking to use. In the case, you'll want to read through the SqoopHook and the SqoopOperator codebase. Most of what I know how to do with Airflow comes from reading the code, while I haven't used this operator, I can try and help you out here best I can.
Let's assume you want to execute this this sqoop command:
sqoop import --connect jdbc:mysql://mysql.example.com/testDb --username root --password hadoop123 --table student
And you have a Sqoop server running on a remote host which you can access with the Scoop client at http://scoop.example.com:12000/sqoop/.
First, you'll need to create the connection in the Airflow Admin UI, call the connection sqoop
. For the connection, fill in host
as scoop.example.com
, schema
as sqoop
, and port
as 12000
. If you have a password, you will need to put this into a file on your server and in extras
fill out a json string that looks like {'password_file':'/path/to/password.txt'}
(see inline code about this password file).
After you set up the connection in the UI, can now create an task using the SqoopOperator in you DAG file. This might look like this:
sqoop_mysql_export = SqoopOperator(conn_id='sqoop',
table='student',
username='root',
password='password',
driver='jdbc:mysql://mysql.example.com/testDb',
cmd_type='import')
You can see the full list of paramaters you might want to pass for imports can be found in the code here.
You can see how the SqoopOperator (and really the SqoopHook which the operator leverages to connect to Sqoop) translates these arguments to command line commands here.
Really this SqoopOperator just works by translating the kwargs you pass into sqoop client CLI commands. If you check out the SqoopHook, you can see how that's done and probably figure out how to make it work for your case. Good luck!
To troubleshoot, I would recommend SSHing into the server you're running Airflow on and confirm you can run the Scoop client from the command line and connect to the remote Scoop server.