15

I have a Scala Maven project using that uses Spark, and I am trying implement logging using Logback. I am compiling my application to a jar, and deploying to an EC2 instance where the Spark distribution is installed. My pom.xml includes dependencies for Spark and Logback as follows:

        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.1.7</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>log4j-over-slf4j</artifactId>
            <version>1.7.7</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

When submit my Spark application, I print out the slf4j binding on the command line. If I execute the jars code using java, the binding is to Logback. If I use Spark (i.e. spark-submit), however, the binding is to log4j.

  val logger: Logger = LoggerFactory.getLogger(this.getClass)
  val sc: SparkContext = new SparkContext()
  val rdd = sc.textFile("myFile.txt")

  val slb: StaticLoggerBinder = StaticLoggerBinder.getSingleton
  System.out.println("Logger Instance: " + slb.getLoggerFactory)
  System.out.println("Logger Class Type: " + slb.getLoggerFactoryClassStr)

yields

Logger Instance: org.slf4j.impl.Log4jLoggerFactory@a64e035
Logger Class Type: org.slf4j.impl.Log4jLoggerFactory

I understand that both log4j-1.2.17.jar and slf4j-log4j12-1.7.16.jar are in /usr/local/spark/jars, and that Spark is most likely referencing these jars despite the exclusion in my pom.xml, because if I delete them I am given a ClassNotFoundException at runtime of spark-submit.

My question is: Is there a way to implement native logging in my application using Logback while preserving Spark's internal logging capabilities. Ideally, I'd like to write my Logback application logs to a file and allow Spark logs to still be shown at STDOUT.

sbrannon
  • 180
  • 1
  • 6

5 Answers5

17

I had encountered a very similar problem.

Our build was similar to yours (but we used sbt) and is described in detail here: https://stackoverflow.com/a/45479379/1549135

Running this solution locally works fine, but then spark-submit would ignore all the exclusions and new logging framework (logback) because spark's classpath has priority over the deployed jar. And since it contains log4j 1.2.xx it would simply load it and ignore our setup.

Solution

I have used several sources. But quoting Spark 1.6.1 docs (applies to Spark latest / 2.2.0 as well):

spark.driver.extraClassPath

Extra classpath entries to prepend to the classpath of the driver. Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-class-path command line option or in your default properties file.

spark.executor.extraClassPath

Extra classpath entries to prepend to the classpath of executors. This exists primarily for backwards-compatibility with older versions of Spark. Users typically should not need to set this option.

What is not written here, though is that extraClassPath takes precedence before default Spark's classpath!

So now the solution should be quite obvious.

1. Download those jars:

- log4j-over-slf4j-1.7.25.jar
- logback-classic-1.2.3.jar
- logback-core-1.2.3.jar

2. Run the spark-submit:

libs="/absolute/path/to/libs/*"

spark-submit \
  ...
  --master yarn \
  --conf "spark.driver.extraClassPath=$libs" \
  --conf "spark.executor.extraClassPath=$libs" \
  ...
  /my/application/application-fat.jar \
  param1 param2

I am just not yet sure if you can put those jars on HDFS. We have them locally next to the application jar.

userClassPathFirst

Strangely enough, using Spark 1.6.1 I have also found this option in docs:

spark.driver.userClassPathFirst, spark.executor.userClassPathFirst

(Experimental) Whether to give user-added jars precedence over Spark's own jars when loading classes in the the driver. This feature can be used to mitigate conflicts between Spark's dependencies and user dependencies. It is currently an experimental feature. This is used in cluster mode only.

But simply setting:

--conf "spark.driver.userClassPathFirst=true" \
--conf "spark.executor.userClassPathFirst=true" \

Did not work for me. So I am gladly using extraClassPath!

Cheers!


Loading logback.xml

If you face any problems loading logback.xml to Spark, my question here might help you out: Pass system property to spark-submit and read file from classpath or custom path

Atais
  • 10,857
  • 6
  • 71
  • 111
  • I'm trying to follow this, cause i have the same problems, i actually want spark-submit to send logs to loggy so i'm using logback and a loggly appender. I use maven, and an uber jar that contains all files to send in spark-submit. But what would be the class path there? I don't see any absolute path workable. can you clarify that part? – user1161137 Oct 07 '20 at 18:17
  • I don't understand the question: "But what would be the class path there?" I think you would be better of creating your question and describing what you have done there. – Atais Oct 07 '20 at 18:38
  • k, added question https://stackoverflow.com/questions/64250847/configuring-apache-spark-logging-with-maven-and-logback – user1161137 Oct 07 '20 at 19:16
  • found my problem. i couldn't get it working with what you stated above, but i could by shading the org.slf4j (as stated in an answer by @matemaciek) . will detail my entire solution in the ticket i posted above. – user1161137 Oct 08 '20 at 03:21
  • It's because you can't pack logback in fat jar. They have to be provided externally in my solution. It is due to classpath prioritization in spark. – Atais Oct 08 '20 at 07:04
3

I had the same problem: I was trying to use a logback config file. I tried many permutation, but I did not get it to work.

I was accessing logback through grizzled-slf4j using this SBT dependency:

"org.clapper" %% "grizzled-slf4j" % "1.3.0",

Once I added the log4j config file:

src/main/resources/log4j.properties/log4j.properties files.

my logging worked fine.

Sami Badawi
  • 977
  • 1
  • 10
  • 22
  • Ended up using the same approach, unfortunately this is still using log4j as the underlying framework though. Still wondering if anyone was able to configure Logback. – sbrannon Feb 09 '17 at 21:57
  • 1
    I wasted many hours looking. But I would like to know myself. – Sami Badawi Feb 09 '17 at 22:04
3

After much struggle I've found another solution: library shading.

After I've shaded org.slf4j, my application logs are separated from spark logs. Furthermore, logback.xml in my application jar is honored.

Here you can find information on library shading in sbt, in this case it comes down to putting:

assemblyShadeRules in assembly += ShadeRule.rename("org.slf4j.**" -> "your_favourite_prefix.@0").inAll

in your build.sbt settings.


Side note: If you are not sure whether shading actually happened, open your jar in some archive browser and check whether directory structure reflects shaded one, in this case your jar should contain path /your_favourite_prefix/org/slf4j, but not /org/slf4j

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
matemaciek
  • 621
  • 7
  • 11
  • My intention was to use the framework of my choice for both: application and spark and my solution allows it. But yes if you want separate configs shading is fine – Atais Apr 27 '18 at 06:54
  • 1
    This is the only solution worked in my case. All other mentioned options (and various combinations of them) were unsuccessful – seven Aug 28 '19 at 14:35
  • I successfully shaded slf4j in maven, but received the error : SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Is there something else that needed to be set that you didn't mention? – user1161137 Oct 07 '20 at 20:50
0

I packed logback and log4j-to-slf4j along with my other dependencies and src/main/resources/logback.xml in a fat jar.

When I run spark-submit with

--conf "spark.driver.userClassPathFirst=true" \
--conf "spark.executor.userClassPathFirst=true"

all logging is handled by logback.

  • 1
    i get the following error when trying that: Exception in thread "main" java.lang.LinkageError: loader constraint violation: when resolving method "org.slf4j.impl.StaticLoggerBinder.ge tLoggerFactory()Lorg/slf4j/ILoggerFactory;" the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current cl ass, org/slf4j/LoggerFactory, and the class loader (instance of sun/misc/Launcher$AppClassLoader) for the method's defining class, org/slf4 j/impl/StaticLoggerBinder, have different Class objects for the type org/slf4j/ILoggerFactory used in the signature – user1161137 Oct 07 '20 at 18:27
  • This solution worked for me, but once I added the delta dependency in the fat jar, I got the same error as @user1161137 – Gara Walid Sep 16 '22 at 07:55
  • This worked for me. I ran into this in a scenario where I was using `SparkLauncher` to kick off a job programmatically. I had to make sure that `logback` was in the uber jar, and use `.setConf("spark.driver.userClassPathFirst", "true")` & `.setConf("spark.executor.userClassPathFirst", "true")` on the launcher. – Def_Os Apr 27 '23 at 16:16
0

I had to modify the solution presented by Atais to get it working in cluster mode. This worked for me:

libs="/absolute/path/to/libs/*"

spark-submit \
--master yarn \
--deploy-mode cluster \
... \
--jars $libs \
--conf spark.driver.extraClassPath=log4j-over-slf4j-1.7.25.jar:logback-classic-1.2.3.jar:logback-core-1.2.3.jar:logstash-logback-encoder-6.4.jar \
--conf spark.executor.extraClassPath=log4j-over-slf4j-1.7.25.jar:logback-classic-1.2.3.jar:logback-core-1.2.3.jar:logstash-logback-encoder-6.4.jar \
/my/application/application-fat.jar \
param1 param2

The underlying reason was that the jars were not available to all nodes and had to be made explicitly available (even after submitting with --jars).

Update: Refined the solution further. You can also pass the jars as list of urls, i.e. --jars url1,url2,url3. These jars still have to be added to the class path to be prioritized over log4j.

Pulex
  • 11
  • 3