0

I have a cronjob that runs with the cron schedule interval 05 */1 * * 1-5. Or as Crontab Guru says, “At minute 5 past every hour on every day-of-week from Monday through Friday.” (in EST instead of UTC)?

How can I convert this into a 'America/New_York' timezone aware Airflow DAG that will run the same exact way?

I asked a previous question on timezone aware DAGs in Airflow but it is not apparent to me in the answer or in the Airflow documentation how to make the jump from a DAG that has a start_date with tzinfo and a schedule_intervalthat mimics a cronjob.

I am currently trying to use a DAG with the my_dag.py file as follows:

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
import pendulum

local_tz = pendulum.timezone("America/New_York")

default_args=dict(
    owner = 'airflow',
    start_date=datetime(2018, 11, 7, 13, 5, tzinfo=local_tz), # 1:05 PM on Nov 7
    schedule_interval=timedelta(hours=1),
)

dag = DAG('my_test_dag', catchup=False, default_args=default_args)

op = BashOperator(
    task_id='my_test_dag',
    bash_command="bash -i /home/user/shell_script.sh",
    dag=dag
)

However, the DAG never gets scheduled. What am I doing wrong here?

Scott Skiles
  • 3,647
  • 6
  • 40
  • 64

1 Answers1

2

Airflow support the use of cron expressions. schedule_interval is defined as a DAG arguments, and receives preferably a cron expression as a str, or a datetime.timedelta object. Alternatively, you can also use one of these cron “preset”:None, @once, @hourly, @daily, @weekly , @monthly, @yearly.

As I see, the timezone awareness is correct, but schedule interval should be change.

args=dict(
owner = 'airflow',
start_date=datetime(2018, 11, 7, 13, 5, tzinfo=local_tz), # 1:05 PM on Nov 7
)

dag=DAG(dag="dagname_here",
default_args=args,
schedule_interval='05 */1 * * 1-5' #should be string)

NOTE: Please be reminded that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be triggered soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

For reference: Airflow Scheduling

SMDC
  • 709
  • 1
  • 9
  • 17