1

I have a PySpark job that was submitted to Yarn by airflow using a SparkSubmitOperator. In the python file test.py I have this logging:

import logging
logger = logging.getLogger("myapp")
logger.info("this is to log")

The opreator looks like this:

spark_etl= SparkSubmitOperator(
     task_id = "etl_job",
     name = "transform files",
     application = "test.py",
     .... 

I checked the application log in the Yarn application manager, but the log was not printed out there. I checked the log for this airflow task, it was not printed out there either. Could you please help me to understand how/ where is PySpark application log is saved? Many thanks for your help.

user4046073
  • 821
  • 4
  • 18
  • 39

1 Answers1

0

When you submit a PySpark/Spark job to YARN, your code gets executed in a container created by Spark. At runtime, the logging is initialized inside the container and all the files inside the container are marked as "temporary" until the application finishes running and deletes them afterwards.

YARN does not transfer any messages back to the Spark driver other than the status of the job you submitted.

Check this for more details: Where are logs in Spark on YARN?

gabzo
  • 198
  • 1
  • 13