2

I am using airflow docker-compose from here and I have some performance issue along with strange behavior of airflow crashing.

First I have 5 DAGs running at the sametime, each one of them has 8 steps with max_active_runs=1:

 step1x
 step2y
 step3 >> step4 >> step8
 step3 >> step5 >> step8
 step3 >> step6 >> step8
 step3 >> step7 >> step8

I would like to know what configuration should I use in order to maximize Airflow parallelism vs Stability. i.e.: I want to know what is the maximum recommanded [OPTIONS BELOW] for a machine that has X CPU and Y GB of RAM. I am using a LocalExecutor but can't figure out how should I configure the parallelism:

AIRFLOW__SCHEDULER__SCHEDULER_MAX_THREADS=?
AIRFLOW__CORE__PARALLELISM=?
AIRFLOW__WEBSERVER__WORKERS=?

Is there a documentation that states the recommandation for each one of those based on your machine specification ?

deltascience
  • 3,321
  • 5
  • 42
  • 71

1 Answers1

1

I'm not sure you have a parallelism problem...yet.

Can you clarify something? You have 5 different dags with similar set-ups? Or this is launching five instances of the same task at once? I'd expect the former because of the max_active_runs setting.

On your task declaration here:

 step1x
 step2y
 step3 >> step4 >> step8
 step3 >> step5 >> step8
 step3 >> step6 >> step8
 step3 >> step7 >> step8

Are you expecting step1x, step2y and step3 to all execute at the same time? Then 4-7 and finally step8? What are you doing in the DAG where you need that kind of process vs 1-8 sequential?

  • yes all 5 Dags have the same set-up but uses different parameters. step1, 2 and 3 are independent and can run in parallel. The Dags are processing data from different sources with different settings. Each source is providing data every minute, so I created a Dag for each source. – deltascience Nov 19 '19 at 08:41
  • 1
    [This is the best explanation I've found](https://stackoverflow.com/questions/56370720/how-can-i-control-the-parallelism-or-concurrency-of-an-airflow-dag/56370721#56370721) of how the different concurrency settings can be tuned. Might be helpful? – Dan Kleiman Nov 20 '19 at 03:54
  • 1
    It is indeed helpful, thank you. But this question is different I ask for a recommendation of config in regard to the machine's capacity (in CPU & RAM) for a localExecutor – deltascience Nov 21 '19 at 08:53