9

I don't understand why do we need a 'start_date' for the operators(task instances). Shouldn't the one that we pass to the DAG suffice?

Also, if the current time is 7th Feb 2018 8.30 am UTC, and now I set the start_date of the dag to 7th Feb 2018 0.00 am with my cron expression for schedule interval being 30 9 * * * (daily at 9.30 am, i.e expecting to run in next 1 hour). Will my DAG run today at 9.30 am or tomorrow (8th Feb at 9.30 am )?

soupybionics
  • 4,200
  • 6
  • 31
  • 43
  • With the above setting, it didn't run at 9.30 am today. Am I missing anything here or it will run tomorrow (8th Feb at 9.30 am) ? – soupybionics Feb 07 '18 at 10:30

4 Answers4

13

Regarding start_date on task instance, personally I have never used this, I always just have a single DAG start_date.

However from what I can see this would allow you to specify certain tasks to start at a different time from the main DAG. It appears this is a legacy feature and from reading the FAQ they recommend using time sensors for that type of thing instead and just having one start_date for all tasks passed through the DAG.

Your second question:

The execution date for a run is always the previous period based on your schedule.

From the docs (Airflow Docs)

Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

To clarify:

  • If set on a daily schedule, on the 8th it will execute the 7th.
  • If set to a weekly schedule to run on a Sunday, the execution date for this Sunday would be last Sunday.
Blakey
  • 803
  • 7
  • 15
1

Some complex requirements may need specific timings at the task level. For example, I may want my DAG to run each day for a full week before some aggregation logging task starts running, so to achieve this I could set different start dates at the task level.

A bit more useful info... looking through the airflow DAG class source it appears that setting the start_date at the DAG level simply means it is passed through to the task when no default value for task start_date was passed in to the DAG via the default_args dict, or when no specific start_date is are defined on a per task level. So for any case where you want all tasks in a DAG to kick off at the same time (dependencies aside), setting start_date at the DAG level is sufficient.

Jinglesting
  • 509
  • 5
  • 16
0

Just to add to what is already here. A task that depends on another task(s) must have a start date >= to the start date of its dependencies.

For example:

  • if task_a depends on task_b
  • you cannot have
    • task_a start_date = 1/1/2019
    • task_b start_date = 1/2/2019
    • Otherwise, task_a will not be runnable for 1/1/2019 as task_b will not run for that date and you cannot mark it as complete either

Why would you want this?

  • I would have liked this logic for a task, which was an external task sensor waiting for the completion of another dag. But the other dag had a start date after the current dag. Therefore, I didn't want the dependency in place for days when the other dag didn't exist
Arran Duff
  • 1,214
  • 2
  • 11
  • 23
0

it's likely to not set the dag parameter of your tasks as stated by : https://stackoverflow.com/a/61749549/1743724

smbanaei
  • 1,123
  • 8
  • 14