0

I'm new to Airflow and I'm currently building a DAG that will execute a PythonOperator, a BashOperator, and then another PythonOperator structured like this:

def authenticate_user(**kwargs):
    ...
    list_prev = [...]

AUTHENTICATE_USER = PythonOperator(
        task_id='AUTHENTICATE_USER',
        python_callable=authenticate_user,
        provide_context=True,
        dag=dag)

CHANGE_ROLE = BashOperator(
        task_id='CHANGE_ROLE',
        bash_command='...',
        dag=dag)

def calculations(**kwargs):
    list_prev
    ...

CALCULATIONS = PythonOperator(
    task_id='CALCULATIONS',
    python_callable=calculations,
    provide_context=True,
    dag=dag)

My issue is, I create a list of variables in the first PythonOperator (AUTHENTICATE_USER) that I would like to use later in my second PythonOperator (CALCULATIONS) after executing the BashOperator (CHANGE_ROLE). Is there a way for me to carry over that created list into other PythonOperators in my current DAG?

Thank you

y2k-shubham
  • 10,183
  • 11
  • 55
  • 131
JMV12
  • 965
  • 1
  • 20
  • 52

1 Answers1

2

I can think of 3 possible ways (to avoid confusion with the Airflow's concept of Variable, I'll call the data that you want to share between tasks as values)

  1. Airflow XCOMs: Push your values from AUTHENTICATE_USER task and pull them in your CALCULATIONS task. You can either publish and access each value separately or wrap them all into a Python dict or list (better as it reduces db reads and writes)

  2. External system: Persist your values from 1st task into some external system such as database, files or S3-objects and access them from downstream tasks when needed

  3. Airflow Variables: This is a specific case of point (2) above (as Variables are stored in Airflow's backend meta-db). You can programmatically create, modify or delete Variables by exploiting the underlying SQLAlchemy model. See this for hints.

y2k-shubham
  • 10,183
  • 11
  • 55
  • 131