24

I'm trying to override Spark's default log4j.properties, but haven't had any luck. I tried the adding the following to spark-submit:

--conf "spark.executor.extraJavaOptions=Dlog4j.configuration=/tmp/log4j.properties"  
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=/tmp/log4j.properties"

But that didn't seem to work. I also tried using --files option in spark-submit and that also didn't seem to work. Has anyone got logging setup so you have a log4j.properties file per driver and not using the default?

I'm using Mesos and Marathon to run the Spark driver. I wasn't sure of the --files option and I couldn't find any examples of how it's used and what it does exactly.

I would also like to mention that I manually uploaded the log4j.properties file to all my nodes that had my changes for testing.

Version of Spark is 1.1.0 as of right now.

lospejos
  • 1,976
  • 3
  • 19
  • 35
ColinMc
  • 1,238
  • 3
  • 16
  • 33
  • The option should be `-Dlog4j.configuration=file:/tmp/log4j.properties`. Another option is to add the directory containing your log4j.properties to `--driver-class-path`. – vanza Mar 03 '15 at 20:01
  • 1
    @vanza Just tried your suggestions but still no luck. It keeps taking the default log4j.properties file in conf instead of using the one I specified. – ColinMc Mar 03 '15 at 20:12
  • @vanza I think this didn't work because I had a log4j.properties file in conf directory and that was the first in the classpath. – ColinMc Mar 05 '15 at 15:26
  • I can see that causing it. I think you can override the conf dir by setting `SPARK_CONF_DIR`, but I've never tried that. – vanza Mar 05 '15 at 20:44
  • @ColinMc how did you solve this problem finally? – void Mar 04 '16 at 10:34
  • @AswinJoseRoy I can't remember if this was solved. I don't work for that company that uses Spark anymore so I can't comment if any of the answers are the correct solution. – ColinMc Mar 05 '16 at 12:18

6 Answers6

18

For the driver/shell you can set this with the --driver-java-options when running spark-shell or spark-submit scripts.

In Spark you cannot set --conf spark.driver.extraJavaOptions because that is set after the JVM is started. When using the spark submit scripts --driver-java-options substitutes these options into the launch of the JVM (e.g. java -Dblah MyClass) that runs the driver.

Note that the -Dlog4j.configuration property should be a valid URL, so if its from somewhere on your file system use file: URL. If the resource variable cannot be converted to a URL, for example due to a MalformedURLException, then log4j will search for the resource from the classpath.

For example, to use a custom log4j.properties file;

./spark-shell --driver-java-options "-Dlog4j.configuration=file:///etc/spark/my-conf/log4j.warnonly.properties"
NightWolf
  • 7,694
  • 9
  • 74
  • 121
5

There are multiple ways to achieve it, but it depends on your/application needs to choose the best one for your use case -

  • By providing extra java options to Spark Driver and Executor, while your log4j.properties is present at every node of the cluster at the same path (or local machine if you're running job locally), use the below command

    spark-submit --master local[2] --conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j.properties' --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/tmp/log4j.properties' --class com.test.spark.application.TestSparkJob target/application-0.0.1-SNAPSHOT-jar-with-dependencies.jar prod

If log4j.properties is present in your jar at root classpath, then you can skip file: in the command, like below --conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties' --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties'

  • By shipping your log4j.properties file to yarn and providing extra java options to Spark Driver and Executor, this way log4j.properties at every node is not required, yarn will manage in this scenario, use the below command

    spark-submit --master local[2] --files /tmp/log4j.properties --conf 'spark.driver.extraJavaOptions=Dlog4j.configuration=log4j.properties' --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties' --class com.test.spark.application.TestSparkJob target/application-0.0.1-SNAPSHOT-jar-with-dependencies.jar prod

  • By changing the spark conf OR spark default log4j.properties file

    change or update log4j.properties at /etc/spark/conf.dist/log4j.properties

I have tried all these and worked for me, I would suggest also go through heading "Debugging your Application" in below spark post which is really helpful - https://spark.apache.org/docs/latest/running-on-yarn.html

Nitesh Saxena
  • 610
  • 6
  • 10
  • I verified that if using "--files s3://xxx/xxxx/log4j.v2.properties", there is no need to have "file:". So this works: --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j.properties" – Ning Liu Mar 18 '21 at 16:05
  • the first option worked for me, Spark 3.0.1 standalone – Arkadiy Verman Feb 27 '23 at 08:57
4

Just a couple of details are off.

The conf flags should look like this:
--conf spark.executor.extraJavaOptions="-Dlog4j.configuration=log4j.properties" --conf spark.driver.extraJavaOptions="-Dlog4j.configuration=/tmp/log4j.properties" --files /tmp/log4j.properties

You'll also need to use the --files param to upload the log4j.properties file to the cluster, where executors can get to it. Also, as the configs are stated above assumes that you're using client mode, in cluster both configs would use the same relative path: -Dlog4j.configuration=log4j.properties

P.S. if your logging overrides also require additional dependencies you may need to provide them as well: --conf spark.driver.extraClassPath=custom-log4j-appender.jar See: custom-log4j-appender-in-spark-executor

Good luck

Community
  • 1
  • 1
Val
  • 173
  • 2
  • 10
2

I could not make the

--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=/tmp/log4j.properties"

or

--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///tmp/log4j.properties"

to work.

The only one that works for me is the --driver-java-options.

Shinta Smith
  • 466
  • 7
  • 19
2

TL;DR

I'm using Mesos and Marathon to run the Spark driver. I wasn't sure of the --files option and I couldn't find any examples of how it's used and what it does exactly.

I would also like to mention that I manually uploaded the log4j.properties file to all my nodes that had my changes for testing.

Since your log4j.properties is already on your nodes, your only problem is that you forgot file: prefix. As for now, your URIs are not valid.

They should be:

--conf "spark.executor.extraJavaOptions=Dlog4j.configuration=file:/tmp/log4j.properties"  
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/tmp/log4j.properties"

Adding log4j.properties with --files

Sending log4j.properties, to your nodes during spark-submit is quite easy. You need to specify:

--files /absolute/path/to/your/log4j.properties 

and they will be available at root directory for spark-nodes, so you could access them with:

--conf "spark.executor.extraJavaOptions=Dlog4j.configuration=file:log4j.properties"  
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties"

Need more?

If you would like to read about other ways of configuring logging while using spark-submit, please visit my other detailed answer: https://stackoverflow.com/a/55596389/1549135

Atais
  • 10,857
  • 6
  • 71
  • 111
1

I don't believe the spark.driver.extraJavaOptions parameter exists. For spark.executor.extraJavaOptions it appears you have a typo. Try this:

--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=/tmp/log4j.properties"
  • You're right about the driver parameter not existing. I can't remember if I had that typo when I ran the job. I was able to use the spark-submit --files parameter successfully after removing the log4j.properties file from the conf directory. – ColinMc Mar 17 '15 at 11:52
  • How can I make this work with EMR? Is this a parameter to `aws emr` command? How do you push a config file to the host prior to running the job? – Synesso Nov 27 '15 at 07:57
  • There is spark.driver.extraJavaOptions - https://spark.apache.org/docs/latest/configuration.html – kensai Apr 22 '19 at 06:33