I have an Airflow pipeline, and one of the DAGs contains a Spark job. I have two options for the Spark job (the job writes to ElasticSearch, but I don't know if this is useful information):
- write the job in Scala to increase performance
- use pySPark as the Airflow pipeline is defined in python
Is there a better option for readabilit/performance/error handling ? (I have no preference between Scala or Python language)
Thank you in advance