1

Following the question here: How do I log from my Python Spark script, I have been struggling to get:

a) All output into a log file. b) Writing out to a log file from pyspark

For a) I use the following changes to the config file:

# Set everything to be logged to the console
log4j.rootCategory=ALL, file
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/home/xxx/spark-1.6.1/logging.log
log4j.appender.file.MaxFileSize=5000MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

This produces output and now for b) I would like to add my own input to logging from pyspark, but I cannot find any output written to the logs. Here is the code I am using:

import logging
logger = logging.getLogger('py4j')
#print(logger.handlers)
sh = logging.StreamHandler(sys.stdout)
sh.setLevel(logging.DEBUG)
logger.addHandler(sh)
logger.info("TESTING.....")

I can find output in the logfile, but no "TESTING...." I have also tried using the existing logger stream but this does not work either.

import logging
logger = logging.getLogger('py4j')
logger.info("TESTING.....")
Community
  • 1
  • 1
disruptive
  • 5,687
  • 15
  • 71
  • 135

2 Answers2

3

Works in my configuration:

log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger(__name__)
LOGGER.info("Hello logger...")
camaris
  • 151
  • 6
-1

All output into a log file & Writing out to a log file from pyspark

import os
import sys
import logging
import logging.handlers

log = logging.getLogger(__name_)

handler = logging.FileHandler("spam.log")
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
log.addHandler(handler)
sys.stderr.write = log.error 
sys.stdout.write = log.info 

(will log every error in "spam.log" in the same directory, nothing will be on console/stdout)

(will log every info in "spam.log" in the same directory,nothing will be on console/stdout)

to print output error/info in both file as well as in console remove above two line.

Happy Coding Cheers!!!

Dean
  • 19
  • 3