4

I am using Spark 2.4.4 and I want to redirect the logs of a specific class to a specific logger, with a File appender.

Problem is whenever I specify a log4j2.properties, it seems to override the settings of Spark's Root logger somehow. I have no idea where the latter is configured, and I dont want to have anything to do with it and leave it "as is".
What I do want though is to log my own way in a separate logger.

I would still want to use a property file if possible, as doing it programmatically is a boilerplate code festival.

import org.apache.log4j.Logger
import org.apache.logging.log4j.scala.Logging

object StatusHolder {
//@transient lazy val logger = Logger.getLogger(getClass.getName)
@transient lazy val logger = LogManager.getLogger(getClass.getName)

//log stuff

}


Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130
  • I found this file, it is probably included in the spark build https://github.com/apache/spark/blob/master/conf/log4j.properties.template and needs to be edited and included in my jar. will try that based on properties syntax as shown here https://logging.apache.org/log4j/2.x/manual/configuration.html – Mehdi LAMRANI Jan 01 '20 at 12:03
  • 1
    This could help: https://stackoverflow.com/questions/23322602/log4j-config-different-logs-to-different-files – blackbishop Jan 01 '20 at 15:15
  • @blackbishop I searched the topic on SOF as well after posting and found that too, it was helpful, although the hard part was finding the right syntax and making it work and integrate nicely with existing logging logic from Spark. I finally managed to get it working and will be posting the result soon here – Mehdi LAMRANI Jan 01 '20 at 20:28

1 Answers1

1

After some experimenting and fumbling I got this working

Note that I like to keep up to date with the latest lib versions and Spark is still stuck with Log4j v1
and lacks support for Log4J2 as of today (early 2020), but I made it work somehow.

Here are the complete and detailed steps you need to go through

1 - Code for log4j2.properties
XML syntax is easier to follow, but properties are more straightforward when you finally get it.
And it is the Spark way to do it also, so better stick to it in this context.

name MySparkLong4JConf

appenders = CA,FA

appender.CA.type = Console
appender.CA.name = CONSOLE
appender.CA.layout.type = PatternLayout
appender.CA.layout.pattern = [%-5level] %d{yy-MM-dd HH:mm:ss}  %c{1} - %msg%n

appender.FA.type = File
appender.FA.name = FILE
appender.FA.append = false
appender.FA.fileName = /etc/log/spark-custom-log-${date:yyyyMMdd_HHmmss}.out
appender.FA.layout.type = PatternLayout
appender.FA.layout.pattern = %d{HH:mm:ss}  %p %c{1}: %m%n
appender.FA.Threshold = INFO

loggers = StatusLogger

logger.StatusLogger.name = org.apache.spark.StatusLogger <= Your class name
logger.StatusLogger.level = INFO
logger.StatusLogger.additivity=false #<= This is IMPORTANT to decontaminate your Console - as there is a hidden root Loger not defined explicitly here
logger.StatusLogger.appenderRefs = CA, FA
logger.StatusLogger.appenderRef.FA.ref = FILE
logger.StatusLogger.appenderRef.CA.ref = CONSOLE

2 - You will need the following maven dependencies included (adapt for SBT)

    <dependency>
          <groupId>org.apache.logging.log4j</groupId>
          <artifactId>log4j-api-scala_2.11</artifactId>
          <version>11.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.logging.log4j</groupId>
      <artifactId>log4j-api</artifactId>
      <version>2.11.0</version>
    </dependency>

    <dependency>
      <groupId>org.apache.logging.log4j</groupId>
      <artifactId>log4j-core</artifactId>
      <version>2.11.0</version>
    </dependency>

3 - You probably will also need to include a reference to shade plugin and include log4j in your jar if you're deploying your code to spark (via spark-shell or spark-submit)

     <artifactSet>
          <includes>
               <include>org.apache.logging.log4j:*</include>
          </includes>
      </artifactSet>

4 - Call your logger from your Spark code You can call this StatusLogger from wherever you want in the code, directly, as it is static.

import org.apache.log4j.Logger
import org.apache.logging.log4j.LogManager
import org.apache.logging.log4j.scala.Logging

//The "with Logger" raised a syntax exception probably due to the fact that this is a (static) Scala object and not a class so I had to do without

object StatusLogger {

//if using log4j1 you would need to specify @transient lazy for serde
//@transient lazy val logger = Logger.getLogger(getClass.getName)

//for some reason in Scala "getClass.getName" appends a "$" for this object 
//so it screws the class name, I had to pass it manually
val logger = LogManager.getLogger("org.apache.spark.StatusLogger") 

def toLogger(m:String) : Unit = {
    logger.info(m)
  }
}

6 - Note that your default Spark Logs will remain pristine and untouched by this mod. The setting will only affect the loggers and appenders you will configure this way (I actually wonder why the setting are not overridden and how is that (anybody?), but it is exactly the result I was aiming for).

Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130