1

Suppose I have an expensive task, such as running an ML job or an Athena query, that is schedule to run daily. I know that the results from the previous day can be re-used if nothing has changed. Additionally, I can detect if anything has changed using an SQL query.

Can Airflow tasks be composed to implement logic like this?

  • For today's task run...
  • Get the result of SELECT MAX(last_updated) FROM items
    • If the result matches yesterday's execution, then copy the previous day's results
    • Otherwise, run the expensive task

Note I am using Airflow 2.2.2 (MWAA)

sdgfsdh
  • 33,689
  • 26
  • 132
  • 245

1 Answers1

0

Can Airflow tasks be composed to implement logic like this?

Yes. This can be achieved with the BranchPythonOperator. See this StackOverflow post for an example.

Reference: BranchPythonOperator (Airflow)

Andrew Nguonly
  • 2,258
  • 1
  • 17
  • 23