0

I am new to airflow. I was trying to schedule a job which uses bashoperator to run a spark submit command.It is working fine but it is keeping airflow busy till it completes the spark job.

cmd = "ssh hadoop@<ipaddress> spark-submit \
   --master yarn \
   --deploy-mode cluster \
   --executor-memory 2g \
   --executor-cores 2 \
   /home/hadoop/main.py"

t = BashOperator(task_id='task1',bash_command=cmd,dag=dag)

How can i make airflow just submit the bash command and move on to another task?

I am currently running airflow on standalone EC2 machine.

Also how can we make airflow run multiple tasks at sametime.

yahoo
  • 183
  • 3
  • 22
  • if you are using EMR, have a look at [EMR operators overview](https://airflow.readthedocs.io/en/latest/howto/operator/amazon/aws/emr.html#overview) and [How to submit Spark jobs to EMR cluster from Airflow?](https://stackoverflow.com/a/54092691/3679900). [`EMRAddStepsOperator`](https://airflow.readthedocs.io/en/latest/_api/airflow/providers/amazon/aws/operators/emr_add_steps/index.html#airflow.providers.amazon.aws.operators.emr_add_steps.EmrAddStepsOperator) is non-blocking – y2k-shubham Aug 10 '20 at 05:36
  • I wanted to use one of these, but there is not a proper documentation on how to start – yahoo Aug 10 '20 at 05:46
  • 1
    `how can we make airflow run multiple dags at sametime.` -> i think this is a typo; it seems you already know that Airflow natively supports multiple DAGs concurrently. How to run multiple `task`s (within a DAG) concurrently is probably what you are looking for. If thats the case, first do understand that if you DONT wire the operators during DAG creation `task_a >> task_b`, then they will automatically run concurrently. Other thing with wiring (if you continue with that) is [trigger_rules](https://airflow.apache.org/docs/stable/concepts.html#trigger-rules) – y2k-shubham Aug 10 '20 at 06:16
  • @y2k-shubham, thanks for making it clear. Yeah, it should be multiple tasks. Can you please help in making use of sparksubmit operartor. I know we have to make some configuration on airflow like spark binaries and all. Its all in bits and pieces. – yahoo Aug 10 '20 at 06:27

0 Answers0