I just started reading about Prefect (and have a little experience using Airflow).
My goal is to set a task which runs daily in Prefect and collects data to a folder (I guess that's what Prefect can help me do for sure). Also my goal is to populate the historical data (as if I ran this job back in time).
In Airflow there is this concept of start_date which when set in the past, the DAG will run since that date and populate on each interval.
For example, if I have a task which takes a date and returns the data for that date, such as:
# Pseudo code
def get_values_from_somewhere(date: datetime) -> dict:
return fetched_values_in_json(date)
Is there a native way to do this in Prefect? I could not find this answered anywhere in here or the docs, though backfilling is mentioned here. Any help / guidance will be super useful.
What I tried:
When I set schedule
to be:
from datetime import datetime, timedelta
from prefect.schedules import Schedule
schedule = Schedule(clocks=[IntervalClock(interval=timedelta(hours=24), start_date=datetime(2019, 1, 1))])
and then I do flow.run()
I simply get:
INFO:prefect.My-Task:Waiting for next scheduled run at 2020-09-24T00:00:00+00:00
What I was expecting is to run since the start_date
which I have provided and then pause until it reaches present time and wait for the next schedule.