1

I am new to using Apache Airflow, after going through the docs, I understood what type of executors are available in airflow and their basic working model.

My question is regarding the CeleryExecutor

When working with this executor, I am unable to find the find where will the DAG's reside.

My Airflow Config is as follows:

airflow_home = /home/airflow
dags_folder = /home/airflow/dags

When i run the command to list DAG's, I get the following output

-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
example_bash_operator
example_branch_dop_operator_v3
example_branch_operator
example_http_operator
example_passing_params_via_test_command
example_python_operator
example_short_circuit_operator
example_skip_dag
example_subdag_operator
example_subdag_operator.section-1
example_subdag_operator.section-2
example_trigger_controller_dag
example_trigger_target_dag
example_xcom
latest_only
latest_only_with_trigger
test_utils
tutorial

Although, there is no dags folder present.

In my cluster, I am running

1 WebServer Node
1 Scheduler + Flower Node
1 MySQL Server Node
2 Celery Worker Nodes

It would be of great help, if someone can explain this concept. To be more specific, I wanted to understand that dags folder will reside on which node.

Thanks in advance.

gunj_desai
  • 782
  • 6
  • 19

1 Answers1

2

You're listing the example DAGs that ship with Airflow. Look at the setting load_example in your configuration file airflow.cfg.

How to remove default example dags in airflow

For your cluster, you'll need to sync DAGs and configuration accross the different machines of the cluster. Look at https://airflow.apache.org/docs/stable/best-practices.html?highlight=cluster#multi-node-cluster and the Celery section

fjsj
  • 10,995
  • 11
  • 41
  • 57
Antoine Augusti
  • 1,598
  • 11
  • 13
  • Got it. Thanks for that info. But maybe i wasn't clear, now that i can create my own DAG's. Which node will they reside in the cluster ? – gunj_desai Jan 30 '18 at 14:55
  • You'll need to sync DAG and configuration accross the different machines of the cluster. Look at https://airflow.apache.org/configuration.html and the Celery section – Antoine Augusti Jan 30 '18 at 15:06
  • Thanks a lot, could you please edit your answer to include the comment, I will mark it as the answer, maybe helpful as a direct answer for someone else. – gunj_desai Jan 31 '18 at 05:55
  • Just did that. Thanks – Antoine Augusti Jan 31 '18 at 09:39
  • Except it's 404'd now. Read the docs: https://incubator-airflow.readthedocs.io/en/latest/configuration.html#scaling-out-with-celery – Forest May 08 '19 at 03:24
  • Seems the right place is here now: https://airflow.apache.org/docs/stable/best-practices.html?highlight=cluster#multi-node-cluster – fjsj Jan 23 '20 at 20:24