21

I'm running Airflow 1.9.0 with LocalExecutor and PostgreSQL database in a Linux AMI. I want to manually trigger DAGs, but whenever I create a DAG that has schedule_interval set to None or to @once, the webserver tree view crashes with the following error (I only show the last call):

File "/usr/local/lib/python2.7/site-packages/croniter/croniter.py", line 467, in expand 
    raise CroniterBadCronError(cls.bad_length)
CroniterBadCronError: Exactly 5 or 6 columns has to be specified for iteratorexpression.

Furthermore, when I manually trigger the DAG, a DAG run starts but the tasks themselves are never scheduled. I've looked around, but it seems that I'm the only one with this type of error. Has anyone encountered this error before and found a fix?

Minimal example triggering the problem:

import datetime as dt
from airflow import DAG
from airflow.operators.bash_operator import BashOperator

default_args = {
    'owner': 'me'
}

bash_command = """
    echo "this is a test task"
"""

with DAG('schedule_test',
        default_args=default_args,
        start_date = dt.datetime(2018, 7, 24),
        schedule_interval='None',
        catchup=False
        ) as dag:

    first_task = BashOperator(task_id = "first_task", bash_command = bash_command)
T. van Hees
  • 211
  • 1
  • 2
  • 5

2 Answers2

24

Try this:

  • Set your schedule_interval to None without the '', or simply do not specify schedule_interval in your DAG. It is set to None as a default. More information on that here: airflow docs -- search for schedule_interval
  • Set orchestration for your tasks at the bottom of the dag.

Like so:

import datetime
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.dummy_operator import DummyOperator

default_args = {
    'owner': 'me'
}

bash_command = """
    echo "this is a test task"
"""

with DAG('schedule_test',
        default_args=default_args,
        start_date = datetime(2018, 7, 24),
        schedule_interval=None,
        catchup=False
        ) as dag:

t1 = DummyOperator(
    task_id='extract_data',
    dag=dag
)

t2 = BashOperator(
    task_id = "first_task", 
    bash_command = bash_command
)

#####ORCHESTRATION#####
## It is saying that in order for t2 to run, t1 must be done.
t2.set_upstream(t1)
Stephen
  • 8,508
  • 12
  • 56
  • 96
Zack
  • 2,296
  • 20
  • 28
  • 2
    Thanks for your help @Zack! I fiigured out yesterday that you need to set `schedule_interval` to the Python object `None` for it to work, haha. Orchestration of the DAG is not really an issue, I'm only struggling with the schedule intervals. – T. van Hees Jul 26 '18 at 05:37
  • By the way, while the BaseOperator may have `schedule_interval = None` by default, for the DAG object it is set to `schedule_interval = timedelta(1)`, see https://airflow.apache.org/code.html#airflow.models.DAG. – T. van Hees Jul 26 '18 at 05:39
  • Also, this still leaves the problem with the DAG set to `schedule_interval='@once'`. I get that Airflow won't schedule a DAG run by itself (since I didn't set `end_date`), but it should schedule manually triggered DAG runs.. – T. van Hees Jul 26 '18 at 05:42
  • 1
    Thanks @zack! yoru answer helps me, However, I believe if you remove schedule_interval, the default will be "@daily". Atleast in airflow 1.10 – SMDC Jan 14 '19 at 09:17
  • Does adding `schedule_interval=None` to default_args work ?? – Prathamesh dhanawade Jan 12 '21 at 17:17