the Local Executor spawns new processes while scheduling tasks. Is there a limit to the number of processes it creates. I needed to change it. I need to know what is the difference between scheduler's "max_threads" and "parallelism" in airflow.cfg ?
3 Answers
parallelism: not a very descriptive name. The description says it sets the maximum task instances for the airflow installation, which is a bit ambiguous — if I have two hosts running airflow workers, I'd have airflow installed on two hosts, so that should be two installations, but based on context 'per installation' here means 'per Airflow state database'. I'd name this max_active_tasks.
dag_concurrency: Despite the name based on the comment this is actually the task concurrency, and it's per worker. I'd name this max_active_tasks_for_worker (per_worker would suggest that it's a global setting for workers, but I think you can have workers with different values set for this).
max_active_runs_per_dag: This one's kinda alright, but since it seems to be just a default value for the matching DAG kwarg, it might be nice to reflect that in the name, something like default_max_active_runs_for_dags So let's move on to the DAG kwargs:
concurrency: Again, having a general name like this, coupled with the fact that concurrency is used for something different elsewhere makes this pretty confusing. I'd call this max_active_tasks.
max_active_runs: This one sounds alright to me.
source: https://issues.apache.org/jira/browse/AIRFLOW-57
max_threads gives the user some control over cpu usage. It specifies scheduler parallelism.

- 2,823
- 3
- 25
- 32
-
6Is there a way to specify the parallelism per task? I find that when I am backfilling something like downloading data from an SFTP, I want parallelism to be 4 or 5. However, when I load the data, I want it to be only 1 (if it is more than one, the order data is not loaded in is not guaranteed. Right now I have SERIAL keys which are out of order because I forgot to turn parallelism back to 1 which is slightly annoying) – trench May 15 '17 at 13:01
-
An airflow worker can be on a separate machine without running off of a separate airflow database instance. I run my airflow workers in docker, I give them a queue url and a db url and it works great! – Sethish Jan 10 '18 at 15:29
-
Is `max_active_runs` still relevant? I can't see it in the default config file: https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/default_airflow.cfg – Maximilian Nov 13 '18 at 21:08
-
There's also `worker_concurrency` - is that the same as `dag_concurrency`? – Maximilian Nov 13 '18 at 21:09
-
2from airlow [documentation](https://airflow.apache.org/faq.html#how-can-my-airflow-dag-run-faster): **concurrency**: The Airflow scheduler will run no more than **$concurrency** task instances for your DAG at any given time. Concurrency is defined in your Airflow DAG. If you do not set the concurrency on your DAG, the scheduler will use the default value from the **dag_concurrency** entry in your airflow.cfg. I understand that dag_concurrency=default concurrency (that will be used when you do not set concurrency) – mustafagok Feb 20 '19 at 10:42
-
I am shocked that there still hasn't been a backwards compatible rename for newer airflow installations – Arseniy Banayev Jan 25 '20 at 16:20
-
Note there is also `task_concurrency`, which is at the operator level. https://airflow.apache.org/docs/apache-airflow/stable/faq.html#how-can-my-airflow-dag-run-faster – Gabe Dec 16 '20 at 22:37
-
I think the definiton for `dag_concurrency` specifying it as `max_active_tasks_for_worker` is wrong since the two are separate things as described [here](https://docs.astronomer.io/learn/airflow-executors-explained#related-definitions) – Dhruv May 30 '23 at 17:37
-
`concurrency` is deprecated in favour of `max_active_tasks` and will be removed in airflow version 3.0 https://airflow.apache.org/docs/apache-airflow/stable/_modules/airflow/models/dag.html#DAG.concurrency – congusbongus Jun 22 '23 at 05:21
It's 2019 and more updated docs have come out. In short:
AIRFLOW__CORE__PARALLELISM
is the max number of task instances that can run concurrently across ALL of Airflow (all tasks across all dags)
AIRFLOW__CORE__DAG_CONCURRENCY
is the max number of task instances allowed to run concurrently FOR A SINGLE SPECIFIC DAG
These docs describe it in more detail:
According to https://www.astronomer.io/guides/airflow-scaling-workers/:
parallelism is the max number of task instances that can run concurrently on airflow. This means that across all running DAGs, no more than 32 tasks will run at one time.
And
dag_concurrency is the number of task instances allowed to run concurrently within a specific dag. In other words, you could have 2 DAGs running 16 tasks each in parallel, but a single DAG with 50 tasks would also only run 16 tasks - not 32
And, according to https://airflow.apache.org/faq.html#how-to-reduce-airflow-dag-scheduling-latency-in-production:
max_threads: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by max_threads with default value of 2. User should increase this value to a larger value(e.g numbers of cpus where scheduler runs - 1) in production.
But it seems like this last piece shouldn't take up too much time, because it's just the "scheduling" portion. Not the actual running portion. Therefore we didn't see the need to tweak max_threads
much, but AIRFLOW__CORE__PARALLELISM
and AIRFLOW__CORE__DAG_CONCURRENCY
did affect us.

- 3,700
- 4
- 31
- 42
The scheduler's max_threads
is the number of processes to parallelize the scheduler over. The max_threads
cannot exceed the cpu count. The LocalExecutor's parallelism
is the number of concurrent tasks the LocalExecutor should run. Both the scheduler and the LocalExecutor use python's multiprocessing library for parallelism.

- 2,138
- 1
- 22
- 28
-
1Just want to mention the `max_threads` was renamed to `parsing_processes` since Airflow 1.10.14 [release](https://airflow.apache.org/docs/apache-airflow/stable/changelog.html?highlight=max_threads). – Old Panda Jan 07 '21 at 20:06
-
what should be the value for `max_threads` or `parsing_processes` and the respective cpu cores be to have 3 dags running in parallel ? – Prathamesh dhanawade Jan 15 '21 at 17:11