1

I have been using PySpark and have a problem with the logging. Logs from the Spark module are piped to STDOUT and I have no control over that from Python.

For example, logs such as this one are being piped to STDOUT instead of STDERR:

2018-03-12 09:50:10 WARN Utils:66 - Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.

Spark is not installed in the environment, only Python and Pyspark.

How do I:

A. Redirect all logs to STDERR

OR

B. If that is not possible, disable the logs.


Things I have tried:

  1. I have tried to use the pyspark.SparkConf() but nothing I configure there seems to work.
  2. I have tried creating SparkEnv.conf and setting the SPARK_CONF_DIR to match just to check if I could at least disable the example log above, to no avail.
  3. I have tried looking at the documentation but no indication of how to accomplish what I am trying.
Inbar Rose
  • 41,843
  • 24
  • 85
  • 131

1 Answers1

1

You can set Log Level to ERROR, so it will only show ERROR logs:

sc.setLogLevel("ERROR")  # sc is a SparkContext() object from the pyspark lib

But if you want to disable all PySpark logs you can do this:

sc.setLogLevel("OFF")

Check this Stack Thread

Inbar Rose
  • 41,843
  • 24
  • 85
  • 131
JordiSilv
  • 36
  • 3
  • I am sorry, but where is `sc` object defined? what is it? – Inbar Rose Mar 12 '18 at 12:59
  • The sc object is the SparkContext. If you are using PySpark integrated with some IDE, automatically it starts a new spark context and provides you a SparkContext object named sc, and a SparkSession named spark. If you are not using PySpark integrated, and maybe importing libraries by hand, the sc object I refer is the object that you create with SparkContext() or SparkContext.getOrCreate() or similar instructions. – JordiSilv Mar 13 '18 at 07:23
  • Thanks, and yes, I have been doing it manually. It seems to solve my problem. Thanks a bunch. – Inbar Rose Mar 13 '18 at 12:28