I need some clarification on relation of AirFlow core:parallelism, Executor Open slots and its impact on the task duration.
Based on my experiment, I noticed that the number of Executor Open Slots is directly proportional to the core:parallelism value of the Google Cloud Composer Airflow Configuration setting. For e.g. for a 3 node composer, the default parallelism as well as the number of slots is 30. With this config, I added a single DAG in entire composer with a specific task. I took average of 10 readings for the single task duration which came to say 15 seconds.
Next I reduced the parallelism to 10 and hence the open slots also to 10. Executed the same DAG with same task. The average time duration remained the same i.e. 15 seconds.
My question: The default config is 30. Suppose my application needs to execute only the above DAG with no other DAGs present on the composer, I would like to take full benefit of the Composer resources. In other words, my expectation was that on reducing the parallelism, the reduced number of slots would be of higher capacity than before since the composer's CPU RAM would be consumed by lesser number of slots. But the experiment shows that average task duration still remains the same.
Could someone please clarify whether reducing parallelism makes the single task run faster?