Following the question here: How do I log from my Python Spark script, I have been struggling to get:
a) All output into a log file. b) Writing out to a log file from pyspark
For a) I use the following changes to the config file:
# Set everything to be logged to the console
log4j.rootCategory=ALL, file
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/home/xxx/spark-1.6.1/logging.log
log4j.appender.file.MaxFileSize=5000MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
This produces output and now for b) I would like to add my own input to logging from pyspark, but I cannot find any output written to the logs. Here is the code I am using:
import logging
logger = logging.getLogger('py4j')
#print(logger.handlers)
sh = logging.StreamHandler(sys.stdout)
sh.setLevel(logging.DEBUG)
logger.addHandler(sh)
logger.info("TESTING.....")
I can find output in the logfile, but no "TESTING...." I have also tried using the existing logger stream but this does not work either.
import logging
logger = logging.getLogger('py4j')
logger.info("TESTING.....")