Suppose I have an expensive task, such as running an ML job or an Athena query, that is schedule to run daily. I know that the results from the previous day can be re-used if nothing has changed. Additionally, I can detect if anything has changed using an SQL query.
Can Airflow tasks be composed to implement logic like this?
- For today's task run...
- Get the result of
SELECT MAX(last_updated) FROM items
- If the result matches yesterday's execution, then copy the previous day's results
- Otherwise, run the expensive task
Note I am using Airflow 2.2.2
(MWAA)