I'm pretty new with Airflow, and I'm having this problem: I have a dag that process txt files an convert them to csv, this is the configuration:
one_days_ago = datetime.combine(datetime.today() - timedelta(1),
datetime.min.time())
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': one_days_ago,
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=1),
'max_active_runs':1,
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG('process_file', default_args=default_args, schedule_interval='@daily')
The problem is that when the dag runs, process the file from the day, but also gives previous run results, so I don't have only one csv file, just from today, I have that one and other 4 or 5 files from previous days. I have read about backfill, but I'm not sure how to avoid it or what am I doing wrong. Any suggestion? It is possible to clean successful running results from previous executions?