I have a regular ETL job that runs on an AWS ec2 instance.
The workflow looks like the following:
- Bring up the ec2 instance using
EC2StartInstanceOperator
operator. - Find out public IP using
boto3
function wrapped inside aPythonOperator
. This operator pushes the IP to XCOM. - Establish an SSH hook using the public IP and run a remote command using
SSHOperator
. - Stop the ec2 instance upon completion using
EC2StopInstanceOperator
.
The issues with the above are:
- The SSH hook (
airflow.providers.ssh.hooks.ssh.SSHHook
in Airflow 2.0) can not access XCOM, only operators do. - AWS ec2 instances do not get reassigned the same public IP between the runs, so I have to run the
PythonOperator
to find out the public IP during every run.
Thanks!