18

I working in a cluster where I do not have the permission to change the file log4j.properties to stop the info logging while using pyspark (as explained in first answer here.) The following solution as explained in the above question's first answer work for spark-shell (scala)

import org.apache.log4j.Logger
import org.apache.log4j.Level

But for spark with python (ie pyspark), it didn't work nor the following

Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)

How can I stop the verbose printing of info in pyspark WITHOUT changing log4j.properties file?

Community
  • 1
  • 1
hmi2015
  • 831
  • 3
  • 10
  • 22

3 Answers3

22

I used sc.setLogLevel("ERROR") because I didn't have write access to our cluster's log4j.properties file. From the docs:

Control our logLevel. This overrides any user-defined log settings. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN

Galen Long
  • 3,693
  • 1
  • 25
  • 37
  • Why does this solution not work for me? sc.setLogLevel("Error") Traceback (most recent call last): File "", line 1, in AttributeError: 'SparkContext' object has no attribute 'setLogLevel' –  Jun 22 '16 at 05:53
  • 9
    This didn't seem to affect the executor logging for me – Taylor D. Edmiston Sep 01 '17 at 03:49
  • 2
    Tried this but seems like it's not disabling log4j logs, I'm still seeing something like this even if I set log level to "OFF": Caused by: java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:347) at scala.None$.get(Option.scala:345) – seiya Jan 24 '18 at 21:15
  • @GalenLong I use pyspark with PYthon 3.4. I use dataframe which is implemented in Scala that's why I saw these scala log data. – seiya Feb 05 '18 at 15:32
  • 3
    Makes no difference for me – lfk Feb 16 '18 at 03:19
  • 1
    Does not work. This does not seem to be the right answer. – James Madison Sep 27 '19 at 01:49
  • For users of PySpark visiting this overflow, see https://stackoverflow.com/a/40504350/2523501 for the answer. – yeliabsalohcin Jan 31 '20 at 12:16
6

This helps for me:

import logging
s_logger = logging.getLogger('py4j.java_gateway')
s_logger.setLevel(logging.ERROR)
spark_context = SparkContext()   
Oleg Ladygin
  • 61
  • 1
  • 1
4

from https://stackoverflow.com/a/32208445/3811916:

logger = sc._jvm.org.apache.log4j
logger.LogManager.getLogger("org").setLevel( logger.Level.OFF )
logger.LogManager.getLogger("akka").setLevel( logger.Level.OFF )

does the trick for me. This is essentially how it's done within PySpark's own tests:

class QuietTest(object):
    def __init__(self, sc):
        self.log4j = sc._jvm.org.apache.log4j

    def __enter__(self):
        self.old_level = self.log4j.LogManager.getRootLogger().getLevel()
self.log4j.LogManager.getRootLogger().setLevel(self.log4j.Level.FATAL)
Community
  • 1
  • 1
eddies
  • 7,113
  • 3
  • 36
  • 39