The answer is yes, the cron schedule supports having DAGs run in DST aware timezones.
But there are a number of caveats so I have to assume the maintainers of Airflow do not have this as a supported use case. Firstly the documentation, as of the time of writing, is explicitly wrong when it states:
Cron schedules
In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will then ignore day light savings time. Thus, if you have a schedule that says run at end of interval every day at 08:00 GMT+1 it will always run end of interval 08:00 GMT+1, regardless if day light savings time is in place.
I've written this somewhat hacky code which let's you see how a schedule will work without the need for a running Airflow instance (be careful you have Penulum 1.x installed and using the correct documentation if you run or edit this code):
import pendulum
from airflow import DAG
from datetime import timedelta
# Set-up DAG
test_dag = DAG(
dag_id='foo',
start_date=pendulum.datetime(year=2019, month=4, day=4, tz='Pacific/Auckland'),
schedule_interval='00 03 * * *',
catchup=False
)
# Check initial schedule
execution_date = test_dag.start_date
for _ in range(7):
next_execution_date = test_dag.following_schedule(execution_date)
if next_execution_date <= execution_date:
execution_date = test_dag.following_schedule(execution_date + timedelta(hours=2))
else:
execution_date = next_execution_date
print('Execution Date:', execution_date)
This gives us a 7 day period over which New Zealand experiences DST:
Execution Date: 2019-04-03 14:00:00+00:00
Execution Date: 2019-04-04 14:00:00+00:00
Execution Date: 2019-04-05 14:00:00+00:00
Execution Date: 2019-04-06 14:00:00+00:00
Execution Date: 2019-04-07 15:00:00+00:00
Execution Date: 2019-04-08 15:00:00+00:00
Execution Date: 2019-04-09 15:00:00+00:00
As we can see DST is observed using the cron schedule, further if you edit my code to remove the cron schedule you can see that DST is not observed.
But be warned, even with the cron schedule observing DST you may still have an out by 1 day error and on the day of the DST change because Airflow is providing the previous date and not the current date (e.g. Sunday on the Calendar but in Airflow the execution date is Saturday). It doesn't look to me like this is accounted for in the follow_schedule
logic.
Finally as @dlamblin points out the variables that Airflow provides to the jobs, either via templated strings or provide_context=True
for Python callables will be the wrong if the local execution date for the DAG is not the same as the UTC execution date. This can be observed in TaskInstance.get_template_context which uses self.execution_date
without modifying it to be in local time. And we can see in TaskInstance.__init__ that self.execution_date
is converted to UTC.
The way I handle this is to derive a variable I call local_cal_date
by doing what @dlamblin suggests and using the convert
method from Pendulum. Edit this code to fit your specific needs (I actually use it in a wrapper around all my Python callables so that they all receive local_cal_date
):
import datetime
def foo(*args, dag, execution_date, **kwargs):
# Derive local execution datetime from dag and execution_date that
# airflow passes to python callables where provide_context is set to True
airflow_timezone = dag.timezone
local_execution_datetime = airflow_timezone.convert(execution_date)
# I then add 1 day to make it the calendar day
# and not the execution date which Airflow provides
local_cal_datetime = local_execution_datetime + datetime.timedelta(days=1)
Update: For templated strings I found for me the best approach was to create custom operators that injected the custom varaibles in to the context before the template is rendered. The problem I found with using custom macros is they don't expand other macros automatically, which means you have to do a bunch of extra work to render them in a useful way. So in a custom operators module I some similar to this code:
# Standard Library
import datetime
# Third Party Libraries
import airflow.operators.email_operator
import airflow.operators.python_operator
import airflow.operators.bash_operator
class CustomTemplateVarsMixin:
def render_template(self, attr, content, context):
# Do Calculations
airflow_execution_datetime = context['execution_date']
airflow_timezone = context['dag'].timezone
local_execution_datetime = airflow_timezone.convert(airflow_execution_datetime)
local_cal_datetime = local_execution_datetime + datetime.timedelta(days=1)
# Add to contexts
context['local_cal_datetime'] = local_cal_datetime
# Run normal Method
return super().render_template(self, attr, content, context)
class BashOperator(CustomTemplateVarsMixin, airflow.operators.bash_operator.BashOperator):
pass
class EmailOperator(CustomTemplateVarsMixin, airflow.operators.email_operator.EmailOperator):
pass
class PythonOperator(CustomTemplateVarsMixin, airflow.operators.python_operator.PythonOperator):
pass
class BranchPythonOperator(CustomTemplateVarsMixin, airflow.operators.python_operator.BranchPythonOperator):
pass