2

I have a Spark cluster running on Kubernetes using this Bitnami Helm chart.

Looking at Spark documentation, I have created a log4j2 config file at the below location:

/opt/bitnami/spark/conf/log4j2.properties

Configuration within this log file works fine as expected.

Now, I have a Spring boot based application which I am executing on this cluster using spark-submit command.

This application has a log4j2 config file at below location:

src/main/resources/log4j2.xml

When I execute this jar using spark-submit command, configuration from this file doesn't seemed to work. I have also supply following argument from other Stackoverflow threads:

--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j2.xml"
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j2.xml"

Still has the same issue.

How can I override cluster level logging configuration from application level logging file?

adesai
  • 370
  • 3
  • 22
  • This is not related to the question but why the spark app is a spring boot application ? spark doesn't need spring boot, you only need to have a main method to be used for the spark-submit – Abdennacer Lachiheb May 19 '23 at 14:25

2 Answers2

2

You need to point to the path of the log4j2.xml in the system that contains the spark-submit something like this starting with file:///:

spark.driver.extraJavaOptions=-Dlog4j.configurationFile=file:///opt/spark/log/log4j2.xml
Abdennacer Lachiheb
  • 4,388
  • 7
  • 30
  • 61
  • check this answer here: https://stackoverflow.com/questions/27781187/how-to-stop-info-messages-displaying-on-spark-console/55596389#55596389 Just like Abdennacer Lachiheb said, adding file should resolve the issue and the answer that i shared contains excellent explication about this subject. – shalnarkftw May 22 '23 at 12:36
1

spark.driver.extraJavaOptions and spark.executor.extraJavaOptions (application properties) are used to pass additional Java options to the driver and executor JVMs, respectively.

Your current configuration is not working because -Dlog4j.configuration=log4j2.xml is looking for the log4j2.xml configuration file in the classpath, but it cannot find it there, so it is reverting to the default log configuration.

You need to specify the full path to the log4j2.xml configuration file in your application jar. If log4j2.xml is inside your application jar, you can provide the path to it using the classpath: prefix.

For instance:

--conf "spark.driver.extraJavaOptions=-Dlog4j.configurationFile=classpath:/log4j2.xml"
--conf "spark.executor.extraJavaOptions=-Dlog4j.configurationFile=classpath:/log4j2.xml"

However, if your log4j2.xml configuration file is not in your application jar but in a different location, you need to provide the absolute path to the configuration file.

That would be:

--conf "spark.driver.extraJavaOptions=-Dlog4j.configurationFile=file:/path/to/your/log4j2.xml"
--conf "spark.executor.extraJavaOptions=-Dlog4j.configurationFile=file:/path/to/your/log4j2.xml"

If the issue persists, it is possible that the driver and executor JVMs are not picking up the extraJavaOptions due to classpath issues.

In addition to passing the log4j2.xml file location via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions, you may also need to ensure that the file is accessible in the classpath of your Spark application.

  1. Add the log4j2.xml to your application's resources, so it will be included in the application JAR file. You have mentioned that you've already done this step.

  2. Use the following spark-submit options to specify the log4j2.xml file as the configuration for Log4j:

    --conf "spark.driver.extraJavaOptions=-Dlog4j.configurationFile=classpath:/log4j2.xml"
    --conf "spark.executor.extraJavaOptions=-Dlog4j.configurationFile=classpath:/log4j2.xml"
    
  3. Submit your application with spark-submit, and the log4j2.xml file should be picked up from the application JAR file.

If this still doesn't work, there might be an issue with how your application or Spark is handling the classpath.

For instance, if you're using a "fat" jar (such as one produced by the Maven Shade Plugin), it could be changing the structure of the classpath in a way that makes the log4j2.xml file inaccessible. If that's the case, you might need to adjust your build process to ensure that the log4j2.xml file ends up in the root of the classpath in the final JAR file.

Additionally, you could try setting the log4j.configurationFile system property in your application's code itself, before any logger is initialized. For example:

System.setProperty("log4j.configurationFile", "classpath:/log4j2.xml");

This would need to be done as early as possible in your application's execution, ideally in the main method, before any other code runs.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Thanks for the answer - I have already tried the second suggestion (with and without supplying --file option to upload the file) and just tried the first suggestion with `-Dlog4j.configuration` and `-Dlog4j.configurationFile`, still no luck. – adesai May 18 '23 at 16:32
  • @adesai OK, I have completed the answer with additional checks/configurations to try out. – VonC May 18 '23 at 16:39
  • I have tried everything with the updated instructions - still no luck! I am using the Maven Shade plugin however rest of the files under `src/main/resources` are being used with the classpath so it shouldn't have any issue with the `log4j2.xml` file as it is under the same location. – adesai May 18 '23 at 17:40
  • @adesai Is your `-Dlog4j.configurationFile` argument is specified before the `-cp` argument or the main class in your spark-submit command? Can you confirm, with `jar tf your-jar-file.jar`, that `log4j2.xml` is indeed at the root of the classpath? – VonC May 18 '23 at 17:47
  • Yes, it is on the root of the classpath. – adesai May 22 '23 at 09:55
  • 1
    @adesai OK. Is there any syntax error in `log4j2.xml`? Can you try with a minimal `log4j2.xml` ,with just a single logger and appender, and see if that gets picked up? Use the Maven Dependency Plugin to analyze your project's dependencies and check for conflicts between Log4j versions. As a last resort, you could try configuring Log4j programmatically in your application's code. – VonC May 22 '23 at 12:12
  • log4j2.xml is working fine when I run in standalone mode. – adesai May 26 '23 at 10:41