Questions tagged [airflow]

Apache Airflow is a workflow management platform to programmatically author, schedule, and monitor workflows as directed acyclic graphs (DAGs) of tasks.

Airflow is a workflow scheduler. It was developed by Airbnb to manage its complicated workflows.

References

Related Tags###

Similar workflow schedulers:

10104 questions
190
votes
17 answers

Proper way to create dynamic workflows in Airflow

Problem Is there any way in Airflow to create a workflow such that the number of tasks B.* is unknown until completion of Task A? I have looked at subdags but it looks like it can only work with a static set of tasks that have to be determined at…
costrouc
  • 3,045
  • 2
  • 19
  • 23
99
votes
9 answers

How to remove default example dags in airflow

I am a new user of Airbnb's open source workflow/datapipeline software airflow. There are dozens of default example dags after the web UI is started. I tried many ways to remove these dags, but I've failed to do so. load_examples = False is set…
bronzels
  • 1,283
  • 2
  • 10
  • 16
96
votes
19 answers

Airflow: how to delete a DAG?

I have started the Airflow webserver and scheduled some dags. I can see the dags on web GUI. How can I delete a particular DAG from being run and shown in web GUI? Is there an Airflow CLI command to do that? I looked around but could not find an…
subba
  • 1,625
  • 2
  • 16
  • 18
86
votes
7 answers

execution_date in airflow: need to access as a variable

I am really a newbie in this forum. But I have been playing with airflow, for sometime, for our company. Sorry if this question sounds really dumb. I am writing a pipeline using bunch of BashOperators. Basically, for each Task, I want to simply…
Roger
  • 2,823
  • 3
  • 25
  • 32
85
votes
4 answers

Apache Airflow or Apache Beam for data processing and job scheduling

I'm trying to give useful information but I am far from being a data engineer. I am currently using the python library pandas to execute a long series of transformation to my data which has a lot of inputs (currently CSV and excel files). The…
LouisB
  • 973
  • 1
  • 7
  • 9
77
votes
3 answers

How to create a conditional task in Airflow

I would like to create a conditional task in Airflow as described in the schema below. The expected scenario is the following: Task 1 executes If Task 1 succeed, then execute Task 2a Else If Task 1 fails, then execute Task 2b Finally execute Task…
Alexis.Rolland
  • 5,724
  • 6
  • 50
  • 77
77
votes
3 answers

How to prevent airflow from backfilling dag runs?

Say you have an airflow DAG that doesn't make sense to backfill, meaning that, after it's run once, running it subsequent times quickly would be completely pointless. For example, if you're loading data from some source that is only updated hourly…
m0meni
  • 16,006
  • 16
  • 82
  • 141
70
votes
6 answers

How to stop/kill Airflow tasks from the UI

How can I stop/kill a running task on Airflow UI? I am using LocalExecutor. Even if I use CeleryExecutor, how do can I kill/stop the running task?
Chetan J
  • 1,847
  • 5
  • 16
  • 21
65
votes
3 answers

How to control the parallelism or concurrency of an Airflow installation?

In some of my Apache Airflow installations, DAGs or tasks that are scheduled to run do not run even when the scheduler doesn't appear to be fully loaded. How can I increase the number of DAGs or tasks that can run concurrently? Similarly, if my…
hexacyanide
  • 88,222
  • 31
  • 159
  • 162
65
votes
15 answers

Airflow 1.9.0 is queuing but not launching tasks

Airflow is randomly not running queued tasks some tasks dont even get queued status. I keep seeing below in the scheduler logs [2018-02-28 02:24:58,780] {jobs.py:1077} INFO - No tasks to consider for execution. I do see tasks in database that…
l0n3r4n83r
  • 1,271
  • 1
  • 14
  • 25
65
votes
3 answers

Writing to Airflow Logs

One way to write to the logs in Airflow is to return a string from a PythonOperator like on line 44 here. Are there other ways that allow me to write to the airflow log files? I've found that print statements are not saved to the logs.
thatkaiguy
  • 753
  • 1
  • 5
  • 6
64
votes
6 answers

Airflow - How to pass xcom variable into Python function

I need to reference a variable that's returned by a BashOperator. In my task_archive_s3_file, I need to get the filename from get_s3_file. The task simply prints {{ ti.xcom_pull(task_ids=submit_file_to_spark) }} as a string instead of the value. If…
sdot257
  • 10,046
  • 26
  • 88
  • 122
63
votes
8 answers

Error while install airflow: By default one of Airflow's dependencies installs a GPL

Getting the following error after running pip install airflow[postgres] command: > raise RuntimeError("By default one of Airflow's dependencies installs > a GPL " > > RuntimeError: By default one of Airflow's dependencies installs a GPL >…
Md Sirajus Salayhin
  • 4,974
  • 5
  • 37
  • 46
62
votes
8 answers

Airflow s3 connection using UI

I've been trying to use Airflow to schedule a DAG. One of the DAG includes a task which loads data from s3 bucket. For the purpose above I need to setup s3 connection. But UI provided by airflow isn't that intutive…
Nikhil Reddy
  • 723
  • 1
  • 5
  • 6
60
votes
13 answers

How do I restart airflow webserver?

I am using airflow for my data pipeline project. I have configured my project in airflow and start the airflow server as a backend process using following command airflow webserver -p 8080 -D True Server running successfully in backend. Now I want…
MJK
  • 1,381
  • 3
  • 15
  • 22
1
2 3
99 100