3

I am using Python to implement spark jobs. We wanted to get the python logging output from the application into Spark history server. So we used the method outlined here:

PySpark logging from the executor

However the problem is that, since the yarn_logger initialization is only happening in the driver, the executors still run with python logging level of WARNING which means no logs show up for the executor.

In my driver I do the following:

if __name__=='__main__':

    # initialize logging in main
    yarn_logger.YarnLogger.setup_logger()

And in the other python files, I just intialize python logging module:

import logging
LOG = logging.getLogger(__name__)

But this only results in logs that occur in the driver context from showing up.

How do I architect it so that, yarn_logger is only initialized once per process, no matter whether the application is running in local mode or cluster mode? I can of course, initialize yarn_logger in each python module of my application, but it might cause it to initialize multiple times in the application if I run it in local mode.

Community
  • 1
  • 1
feroze
  • 7,380
  • 7
  • 40
  • 57

0 Answers0