How to turn off INFO logging in Spark?

Question

I installed Spark using the AWS EC2 guide and I can launch the program fine using the bin/pyspark script to get to the spark prompt and can also do the Quick Start quide successfully.

However, I cannot for the life of me figure out how to stop all of the verbose INFO logging after each command.

I have tried nearly every possible scenario in the below code (commenting out, setting to OFF) within my log4j.properties file in the conf folder in where I launch the application from as well as on each node and nothing is doing anything. I still get the logging INFO statements printing after executing each statement.

I am very confused with how this is supposed to work.

#Set everything to be logged to the console log4j.rootCategory=INFO, console                                                                        
log4j.appender.console=org.apache.log4j.ConsoleAppender 
log4j.appender.console.target=System.err     
log4j.appender.console.layout=org.apache.log4j.PatternLayout 
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Here is my full classpath when I use SPARK_PRINT_LAUNCH_COMMAND:

Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/bin/java -cp :/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/conf:/root/spark-1.0.1-bin-hadoop2/lib/spark-assembly-1.0.1-hadoop2.2.0.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-core-3.2.2.jar:/root/spark-1.0.1-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main

contents of spark-env.sh:

#!/usr/bin/env bash

# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.

# Options read when launching programs locally with 
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program
# - SPARK_CLASSPATH=/root/spark-1.0.1-bin-hadoop2/conf/

# Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_CLASSPATH, default classpath entries to append
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_LIBRARY, to point to your libmesos.so if you use Mesos

# Options read in YARN client mode
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)
# - SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1).
# - SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
# - SPARK_YARN_APP_NAME, The name of your application (Default: Spark)
# - SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests (Default: ‘default’)
# - SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.
# - SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.

# Options for the daemons used in the standalone deploy mode:
# - SPARK_MASTER_IP, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers

export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf"

In Spark program after creating session you can set Log level as given below for Java SparkSession spark= SparkSession.builder().master("local").getOrCreate(); spark.sparkContext().setLogLevel("INFO"); — iKing, Jan 28 '20 at 08:05

score 187 · Accepted Answer · edited Sep 07 '16 at 16:12

187

Just execute this command in the spark directory:

cp conf/log4j.properties.template conf/log4j.properties

Edit log4j.properties:

# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

Replace at the first line:

log4j.rootCategory=INFO, console

by:

log4j.rootCategory=WARN, console

Save and restart your shell. It works for me for Spark 1.1.0 and Spark 1.5.1 on OS X.

edited Sep 07 '16 at 16:12

gsamaras

71,951
46
188
305

answered Sep 30 '14 at 14:36

poiuytrez

21,330
35
113
172

1

This helped, important to realise that log4j.properties doesn't exist unless you create it. On ubuntu, I didn't need to restart for these changes to take affect. – disruptive Jun 18 '15 at 14:19
Did not work for me. Spark 1.5. RHEL 6. CDH 5.5. Tried creating new file /opt/cloudera/parcels/CDH/etc/spark/conf.dist/log4j.properties and changing like explained above. And also tried editing existing file /etc/spark/conf/log4j.properties. No effect for pyspark shell nor for pyspark-shell. – Tagar Jan 31 '16 at 00:56
do we need to do this for all the nodes in the spark cluster? – cloud May 13 '17 at 14:15
This is blocking the info logs that Im manually passing too. How do I restrict it hide just the spark info logs ? – ss301 Feb 21 '22 at 19:35

score 83 · Answer 2 · answered Nov 09 '16 at 10:07

83

In Spark 2.0 you can also configure it dynamically for your application using setLogLevel:

    from pyspark.sql import SparkSession
    spark = SparkSession.builder.\
        master('local').\
        appName('foo').\
        getOrCreate()
    spark.sparkContext.setLogLevel('WARN')

In the pyspark console, a default spark session will already be available.

answered Nov 09 '16 at 10:07

mdh

5,355
5
26
33

1

You just suppressed log messages. But actual code is running in the background. If you see CPU usage. Spark using lot of CPU even when idle. – hurelhuyag Nov 21 '19 at 09:21
1

This was exactly the solution for PySpark work where the `log4j` isn't accessible. – yeliabsalohcin Jan 31 '20 at 12:15

score 63 · Answer 3 · edited Nov 19 '19 at 10:37

63

Inspired by the pyspark/tests.py I did

def quiet_logs(sc):
    logger = sc._jvm.org.apache.log4j
    logger.LogManager.getLogger("org"). setLevel( logger.Level.ERROR )
    logger.LogManager.getLogger("akka").setLevel( logger.Level.ERROR )

Calling this just after creating SparkContext reduced stderr lines logged for my test from 2647 to 163. However creating the SparkContext itself logs 163, up to

15/08/25 10:14:16 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

and it's not clear to me how to adjust those programmatically.

edited Nov 19 '19 at 10:37

sam

1,819
1
18
30

answered Aug 25 '15 at 15:46

FDS

4,999
2
22
13

3

if you have any ideas on how to adjust those lines, please share – Irene Sep 08 '15 at 16:23
I think there is no direct solution to change default debugging level in PySpark.. until SparkContext starts. Because sc._jvm is created only after SC is created. You can still change that through log4j.properies file though as discussed in other answers. Spark should create for example spark.default.logging variable that can be passed to SparkConf as an option to override default Root Logger level. – Tagar Sep 13 '16 at 21:16

score 38 · Answer 4 · answered Jan 07 '15 at 08:44

38

Edit your conf/log4j.properties file and Change the following line:

   log4j.rootCategory=INFO, console

to

    log4j.rootCategory=ERROR, console

Another approach would be to :

Fireup spark-shell and type in the following:

import org.apache.log4j.Logger
import org.apache.log4j.Level

Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)

You won't see any logs after that.

answered Jan 07 '15 at 08:44

AkhlD

2,596
2
16
15

3

the later option works for spark-shell (scala) but what should you do in case of pyspark without changing the log4j file? – hmi2015 Sep 09 '15 at 23:42
Changing the log4j properties file to "warn" would be preferable, but otherwise this answer by wannik does work for changing the log level to console for pyspark http://stackoverflow.com/a/34487962/127971 – michael Jul 03 '16 at 04:36

score 33 · Answer 5 · answered Dec 28 '15 at 05:09

33

>>> log4j = sc._jvm.org.apache.log4j
>>> log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)

answered Dec 28 '15 at 05:09

wannik

12,212
11
46
58

I used this for pyspark. Works great as a one-liner hack. I still get the silly YarnExecutor died messages, which should not be an error, imho. And so it goes... – jatal May 12 '16 at 04:37
3

This suppresses the logging after it executes, but there are a lot of INFO logs prior to that point, unfortunately. – DavidJ Jun 24 '16 at 15:26

Galen Long · Answer 6 · 2019-03-14T20:27:54.183

32

For PySpark, you can also set the log level in your scripts with sc.setLogLevel("FATAL"). From the docs:

Control our logLevel. This overrides any user-defined log settings. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN

edited Mar 14 '19 at 20:27

answered Apr 26 '16 at 23:52

Galen Long

3,693
1
25
37

Great solution that works for versions of Spark newer than 1.4 (so, since mid-2015). – Jealie Aug 08 '17 at 23:35
I tried this with Spark 1.6.2 and Scala and it does not seem to work – Yeikel Jan 01 '19 at 16:13
@Yeikel This solution is for PySpark. Sorry that wasn't made clear - I'll edit the answer now. – Galen Long Mar 14 '19 at 20:27

score 22 · Answer 7 · answered Oct 26 '18 at 11:55

22

You can use setLogLevel

val spark = SparkSession
      .builder()
      .config("spark.master", "local[1]")
      .appName("TestLog")
      .getOrCreate()

spark.sparkContext.setLogLevel("WARN")

answered Oct 26 '18 at 11:55

USB

6,019
15
62
93

score 14 · Answer 8 · answered Aug 08 '14 at 00:11

This may be due to how Spark computes its classpath. My hunch is that Hadoop's log4j.properties file is appearing ahead of Spark's on the classpath, preventing your changes from taking effect.

If you run

SPARK_PRINT_LAUNCH_COMMAND=1 bin/spark-shell

then Spark will print the full classpath used to launch the shell; in my case, I see

Spark Command: /usr/lib/jvm/java/bin/java -cp :::/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3.2.1.jar:/root/spark/lib/datanucleus-core-3.2.2.jar:/root/spark/lib/datanucleus-rdbms-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path=:/root/ephemeral-hdfs/lib/native/ -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main

where /root/ephemeral-hdfs/conf is at the head of the classpath.

I've opened an issue [SPARK-2913] to fix this in the next release (I should have a patch out soon).

In the meantime, here's a couple of workarounds:

Add export SPARK_SUBMIT_CLASSPATH="$FWDIR/conf" to spark-env.sh.
Delete (or rename) /root/ephemeral-hdfs/conf/log4j.properties.

Thank you. I tried to add that to my spark-env.sh file and also tried deleting the log4j.properties file but still getting the INFO output. I have added my full classpath to question. — horatio1701d, Aug 08 '14 at 16:32
Thanks for the extra info. Could you also post the contents of spark-env.sh (you can redact private info, like hostnames)? — Josh Rosen, Aug 08 '14 at 17:58
thank you. posted spark-env.sh. Sorry if I am mis-understanding how to get a base setup going. I just left everything as default as possible for now just to try some testing. — horatio1701d, Aug 08 '14 at 20:05

score 13 · Answer 9 · answered Jul 27 '18 at 08:44

13

Simply add below param to your spark-submit command

--conf "spark.driver.extraJavaOptions=-Dlog4jspark.root.logger=WARN,console"

This overrides system value temporarily only for that job. Check exact property name (log4jspark.root.logger here) from log4j.properties file.

Hope this helps, cheers!

answered Jul 27 '18 at 08:44

Gaurav Adurkar

834
8
9

1

Another one I found useful is that you can specify log4j.properties file: `--conf spark.driver.extraJavaOptions='-Dlog4j.configuration=file:/home/foobar/log4j.properties` – selle Sep 06 '19 at 16:08
Using Spark 2.4.7, the setting `hadoop.root.logger` from @oleksii answer, works perfectly: `--conf "spark.driver.extraJavaOptions=-Dhadoop.root.logger=WARN,console"` – edrabc Nov 02 '21 at 18:41
1

Thank you indeed, this is what I wanted. Unfortunately, neither -Dlog4jspark.root.logger nor -Dhadoop.root.logger worked and partly out of frustration, I tried -Droot.logger and it **worked**. FWIW and in case it helps someone else, this is what I have done: --conf "spark.driver.extraJavaOptions=-Droot.logger=FATAL,console". – alwayslearning Dec 30 '21 at 17:26

score 10 · Answer 10 · answered Mar 18 '18 at 10:51

Spark 1.6.2:

log4j = sc._jvm.org.apache.log4j
log4j.LogManager.getRootLogger().setLevel(log4j.Level.ERROR)

Spark 2.x:

spark.sparkContext.setLogLevel('WARN')

(spark being the SparkSession)

Alternatively the old methods,

Rename conf/log4j.properties.template to conf/log4j.properties in Spark Dir.

In the log4j.properties, change log4j.rootCategory=INFO, console to log4j.rootCategory=WARN, console

Different log levels available:

OFF (most specific, no logging)
FATAL (most specific, little data)
ERROR - Log only in case of Errors
WARN - Log only in case of Warnings or Errors
INFO (Default)
DEBUG - Log details steps (and all logs stated above)
TRACE (least specific, a lot of data)
ALL (least specific, all data)

score 10 · Answer 11 · answered Jul 08 '19 at 16:14

10

Programmatic way

spark.sparkContext.setLogLevel("WARN")

Available Options

ERROR
WARN 
INFO

answered Jul 08 '19 at 16:14

loneStar

3,780
23
40

score 6 · Answer 12 · answered Mar 04 '15 at 15:49

I used this with Amazon EC2 with 1 master and 2 slaves and Spark 1.2.1.

# Step 1. Change config file on the master node
nano /root/ephemeral-hdfs/conf/log4j.properties

# Before
hadoop.root.logger=INFO,console
# After
hadoop.root.logger=WARN,console

# Step 2. Replicate this change to slaves
~/spark-ec2/copy-dir /root/ephemeral-hdfs/conf/

score 2 · Answer 13 · answered Apr 28 '16 at 22:24

2

The way I do it is:

in the location I run the spark-submit script do

$ cp /etc/spark/conf/log4j.properties .
$ nano log4j.properties

change INFO to what ever level of logging you want and then run your spark-submit

answered Apr 28 '16 at 22:24

user3827333

69
6

`cp /etc/spark/conf/log4j.properties.template .` – deepelement Dec 16 '16 at 22:40

score 2 · Answer 14 · answered May 03 '16 at 14:43

I you want to keep using the logging (Logging facility for Python) you can try splitting configurations for your application and for Spark:

LoggerManager()
logger = logging.getLogger(__name__)
loggerSpark = logging.getLogger('py4j')
loggerSpark.setLevel('WARNING')

Ram Ghadiyaram · Answer 15 · 2019-05-10T23:32:04.433

This below code snippet for scala users :

Option 1 :

Below snippet you can add at the file level

import org.apache.log4j.{Level, Logger}
Logger.getLogger("org").setLevel(Level.WARN)

Option 2 :

Note : which will be applicable for all the application which is using spark session.

import org.apache.spark.sql.SparkSession

  private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()

spark.sparkContext.setLogLevel("WARN")

Option 3 :

Note : This configuration should be added to your log4j.properties.. (could be like /etc/spark/conf/log4j.properties (where the spark installation is there) or your project folder level log4j.properties) since you are changing at module level. This will be applicable for all the application.

log4j.rootCategory=ERROR, console

IMHO, Option 1 is wise way since it can be switched off at file level.

score 1 · Answer 16 · answered Jun 12 '21 at 11:32

1

You can also set it like this programmatically, At the beginning of your program.

Logger.getLogger("org").setLevel(Level.WARN)

answered Jun 12 '21 at 11:32

rahul sharma

505
4
17

How to turn off INFO logging in Spark?

16 Answers16

Linked

Related