I have a requirement, where I need to submit a spark job using Airflow. Airflow and Hadoop clusters are on different server.
Currently, simple solution to use use a BashOperator and ssh into Hadoop cluster machine and submit the job.
But,I want to also explore the SparkSubmitOperaror.
I have gone through many articles and stackoverflow question, but did not found any detailed explanation on how to setup Airflow server for this to work. I found below stackoverflow question, where he mentions that,we need to have spark-binaries and need to configure yarn-site.xml on Airflow machine.
Is there a way to submit spark job on different server running master
But, nowhere ,I found how to setup this things.