25

I launch pyspark applications from pycharm on my own workstation, to a 8 node cluster. This cluster also has settings encoded in spark-defaults.conf and spark-env.sh

This is how I obtain my spark context variable.

spark = SparkSession \
        .builder \
        .master("spark://stcpgrnlp06p.options-it.com:7087") \
        .appName(__SPARK_APP_NAME__) \
        .config("spark.executor.memory", "50g") \
        .config("spark.eventlog.enabled", "true") \
        .config("spark.eventlog.dir", r"/net/share/grid/bin/spark/UAT/SparkLogs/") \
        .config("spark.cores.max", 128) \
        .config("spark.sql.crossJoin.enabled", "True") \
        .config("spark.executor.extraLibraryPath","/net/share/grid/bin/spark/UAT/bin/vertica-jdbc-8.0.0-0.jar") \
        .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
        .config("spark.logConf", "true") \
        .getOrCreate()

    sc = spark.sparkContext
    sc.setLogLevel("INFO")

I want to see the effective config that is being used in my log. This line

        .config("spark.logConf", "true") \

should cause the spark api to log its effective config to the log as INFO, but the default log level is set to WARN, and as such I don't see any messages.

setting this line

sc.setLogLevel("INFO")

shows INFO messages going forward, but its too late by then.

How can I set the default logging level that spark starts with?

ThatDataGuy
  • 1,969
  • 2
  • 17
  • 43
  • 4
    Possible duplicate of [How to stop messages displaying on spark console?](https://stackoverflow.com/questions/27781187/how-to-stop-messages-displaying-on-spark-console) – Ani Menon Nov 12 '17 at 06:28

3 Answers3

10

you can also update the log level programmatically like below, get hold of spark object from JVM and do like below

    def update_spark_log_level(self, log_level='info'):
        self.spark.sparkContext.setLogLevel(log_level)
        log4j = self.spark._jvm.org.apache.log4j
        logger = log4j.LogManager.getLogger("my custom Log Level")
        return logger;


use:

logger = update_spark_log_level('debug')
logger.info('you log message')

feel free to comment if you need more details

Suresh
  • 38,717
  • 16
  • 62
  • 66
8

http://spark.apache.org/docs/latest/configuration.html#configuring-logging

Configuring Logging

Spark uses log4j for logging. You can configure it by adding a log4j.properties file in the conf directory. One way to start is to copy the existing log4j.properties.template located there.


The following blog about "How to log in spark" https://www.mapr.com/blog/how-log-apache-spark suggest a way to configure log4j, and provide suggestion which includes directing INFO level logs into a file.

Yaron
  • 10,166
  • 9
  • 45
  • 65
  • 1
    Ok, so is it this setting? log4j.logger.org.apache.spark.repl.Main=INFO – ThatDataGuy Nov 15 '16 at 13:28
  • @ThatDataGuy - added info how to configure log4j (and tested that indeed the output file holds "INFO" level log). Note that the sample configuration direct to /var/log - You'll need to direct the log into a directory which is write-able to the user running spark – Yaron Nov 15 '16 at 14:26
  • where do I create `conf` directory? Next to `src`? Here `src/main/resources/conf/log4j.properties`? This is confusing – Geoff Langenderfer Dec 25 '22 at 23:39
  • 1
    @GeoffLangenderfer in my Dockerfile I'm using the following command when creating the spark docker image: `COPY ./src/main/resources/log4j.properties /configuration` – Yaron Dec 28 '22 at 12:41
8

You need to edit your $SPARK_HOME/conf/log4j.properties file (create it if you don't have one). Now if you submit your code via spark-submit, then you want this line:

log4j.rootCategory=INFO, console

If you want INFO-level logs in your pyspark console, then you need this line:

log4j.logger.org.apache.spark.api.python.PythonGatewayServer=INFO

Michał Jabłoński
  • 1,129
  • 1
  • 13
  • 15
  • I have a package that has spark as a dependency. I jar it up and send it to s3. Where is spark home in this case? `src/main/resources/log4j.properties`? – Geoff Langenderfer Dec 25 '22 at 23:37