50

How to reduce the amount of trace info the Spark runtime produces?

The default is too verbose,

How to turn off it, and turn on it when I need.

Thanks

Verbose mode

scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
15/01/28 09:57:24 INFO SparkContext: Starting job: collect at <console>:15
15/01/28 09:57:24 INFO DAGScheduler: Got job 3 (collect at <console>:15) with 1 output 
...
15/01/28 09:57:24 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
15/01/28 09:57:24 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 626 bytes result sent to driver
15/01/28 09:57:24 INFO DAGScheduler: Stage 3 (collect at <console>:15) finished in 0.002 s
15/01/28 09:57:24 INFO DAGScheduler: Job 3 finished: collect at <console>:15, took 0.020061 s
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)

Silent mode(expected)

scala> val la = sc.parallelize(List(12,4,5,3,4,4,6,781))
scala> la.collect
res5: Array[Int] = Array(12, 4, 5, 3, 4, 4, 6, 781)
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
newBike
  • 14,385
  • 29
  • 109
  • 192
  • 1
    This isn't the scala REPL that's printing that, but the Spark runtime. So you need to be looking at options for controlling Spark's output. – The Archetypal Paul Jan 28 '15 at 10:18
  • 2
    I think this question has more to do with Spark and the logger used by spark than with Scala. So spark uses log4j for loggin, which can be configured in "SPARK_HOME/conf/log4j.properties.template" . Change the settings corresponding to console howerver you want. More specifically change first line to "log4j.rootCategory=ERROR, console" . – sarveshseri Jan 28 '15 at 10:22

4 Answers4

78

Spark 1.4.1

sc.setLogLevel("WARN")

From comments in source code:

Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN

Spark 2.x - 2.3.1

sparkSession.sparkContext().setLogLevel("WARN")

Spark 2.3.2

sparkSession.sparkContext.setLogLevel("WARN")

AlleyOOP
  • 1,536
  • 4
  • 20
  • 43
user5771281
  • 781
  • 5
  • 2
  • works great in 1.6.1! I had trouble for Spark to read properties from my $SPARK_HOME/conf directory, but I like setting properties on runtime even better - although you still receive INFOs during SparkContext initialization, but from then it's how you set it. great when collaborating to have the same config. – tlegutko May 27 '16 at 16:56
  • What is the `sc` variable ? – Moebius Jul 13 '16 at 12:43
  • 1
    @Moebius That would be your instance of SparkContext – LiMuBei Sep 07 '16 at 14:33
  • 2
    In Spark 2.x you need to set sparkSession.sparkContext().setLogLevel("WARN") – nomad Aug 16 '17 at 18:39
42

quoting from 'Learning Spark' book.

You may find the logging statements that get printed in the shell distracting. You can control the verbosity of the logging. To do this, you can create a file in the conf directory called log4j.properties. The Spark developers already include a template for this file called log4j.properties.template. To make the logging less verbose, make a copy of conf/log4j.properties.template called conf/log4j.properties and find the following line:

log4j.rootCategory=INFO, console

Then lower the log level so that we only show WARN message and above by changing it to the following:

log4j.rootCategory=WARN, console

When you re-open the shell, you should see less output.

Shyamendra Solanki
  • 8,751
  • 2
  • 31
  • 25
9

Logging configuration at the Spark app level

With this approach no need of code change in cluster for a spark application.

  • Let's create a new file log4j.properties from log4j.properties.template.
  • Then change verbosity with log4j.rootCategory property.
  • Say, we need to check ERRORs of given jar then, log4j.rootCategory=ERROR, console

Spark submit command would be

spark-submit \
    ... #Other spark props goes here    
    --files prop/file/location \
    --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
    --conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=prop/file/location' \
    jar/location \
    [application arguments] 

Now you would see only the logs which are ERROR categorised.


Plain Log4j way wo Spark(but needs code change)

Set Logging OFF for packages org and akka

import org.apache.log4j.{Level, Logger}

Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)
Community
  • 1
  • 1
mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
1

If you are invoking a command from a shell, there is a lot you can do without changing any configurations. That is by design.

Below are a couple of Unix examples using pipes, but you could do similar filters in other environments.

To completely silence the log (at your own risk)

Pipe stderr to /dev/null, i.e.:

run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 2> /dev/null

To ignore INFO messages

run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999 | awk '{if ($3 != "INFO") print $0}'

Leo
  • 2,775
  • 27
  • 29