After some experimenting and fumbling I got this working
Note that I like to keep up to date with the latest lib versions and Spark is still stuck with Log4j v1
and lacks support for Log4J2 as of today (early 2020), but I made it work somehow.
Here are the complete and detailed steps you need to go through
1 - Code for log4j2.properties
XML syntax is easier to follow, but properties are more straightforward when you finally get it.
And it is the Spark way to do it also, so better stick to it in this context.
name MySparkLong4JConf
appenders = CA,FA
appender.CA.type = Console
appender.CA.name = CONSOLE
appender.CA.layout.type = PatternLayout
appender.CA.layout.pattern = [%-5level] %d{yy-MM-dd HH:mm:ss} %c{1} - %msg%n
appender.FA.type = File
appender.FA.name = FILE
appender.FA.append = false
appender.FA.fileName = /etc/log/spark-custom-log-${date:yyyyMMdd_HHmmss}.out
appender.FA.layout.type = PatternLayout
appender.FA.layout.pattern = %d{HH:mm:ss} %p %c{1}: %m%n
appender.FA.Threshold = INFO
loggers = StatusLogger
logger.StatusLogger.name = org.apache.spark.StatusLogger <= Your class name
logger.StatusLogger.level = INFO
logger.StatusLogger.additivity=false #<= This is IMPORTANT to decontaminate your Console - as there is a hidden root Loger not defined explicitly here
logger.StatusLogger.appenderRefs = CA, FA
logger.StatusLogger.appenderRef.FA.ref = FILE
logger.StatusLogger.appenderRef.CA.ref = CONSOLE
2 - You will need the following maven dependencies included (adapt for SBT)
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api-scala_2.11</artifactId>
<version>11.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.11.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.11.0</version>
</dependency>
3 -
You probably will also need to include a reference to shade plugin and include log4j in your jar if you're deploying your code to spark (via spark-shell or spark-submit)
<artifactSet>
<includes>
<include>org.apache.logging.log4j:*</include>
</includes>
</artifactSet>
4 - Call your logger from your Spark code
You can call this StatusLogger from wherever you want in the code, directly, as it is static.
import org.apache.log4j.Logger
import org.apache.logging.log4j.LogManager
import org.apache.logging.log4j.scala.Logging
//The "with Logger" raised a syntax exception probably due to the fact that this is a (static) Scala object and not a class so I had to do without
object StatusLogger {
//if using log4j1 you would need to specify @transient lazy for serde
//@transient lazy val logger = Logger.getLogger(getClass.getName)
//for some reason in Scala "getClass.getName" appends a "$" for this object
//so it screws the class name, I had to pass it manually
val logger = LogManager.getLogger("org.apache.spark.StatusLogger")
def toLogger(m:String) : Unit = {
logger.info(m)
}
}
6 - Note that your default Spark Logs will remain pristine and untouched by this mod. The setting will only affect the loggers and appenders you will configure this way (I actually wonder why the setting are not overridden and how is that (anybody?), but it is exactly the result I was aiming for).