2

through my recent research I've come to realise that the schedule_interval for airflow has some quirks and I've done my best to try and interpret how it may be affecting what I'm doing, but haven't quite managed to work it out.

I'm using these default args:

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2019, 1, 12),
    'email': ['email@domain.com'],
    'email_on_failure': True,
    'email_on_retry': False,
    'retries': 0,
    'retry_delay': timedelta(minutes=5),
    'schedule_interval': '0 0,12 * * *'
}

and I would like the DAG to run at midnight and noon.

Currently it only runs at midnight and I can't understand why. I'm running this in Google Cloud Composer if that makes any difference.

*edit - fixed typo

Mike Sumner
  • 93
  • 1
  • 6
  • This could potentially help you: https://stackoverflow.com/questions/41730297/python-script-scheduling-in-airflow – Paulie Feb 15 '19 at 10:43
  • thanks but I'm afraid that doesn't have much details on the intricacies of the schedule_interval – Mike Sumner Feb 15 '19 at 10:51

1 Answers1

2

I would use a "every twelfth hour" cron expression, rather than "on hour 0 and 12". As you've probably read, Airflow works by creating intervals and schedule tasks at the end of each interval. Intervals are created by adding the period described by the cron expression to the start date of the Dag.

Try 'schedule_interval': '0 */12 * * *', it will work the same as your expression since your start date is at midnight.

gogstad
  • 3,607
  • 1
  • 29
  • 32