27

In spark-submit, how to specify log4j.properties ?

Here is my script. I have tried all of combinations and even just use one local node. but looks like the log4j.properties is not loaded, all debug level info was dumped.

current_dir=/tmp
DRIVER_JAVA_OPTIONS="-Dlog4j.configuration=file://${current_dir}/log4j.properties "

spark-submit \
--conf "spark.driver.extraClassPath=$current_dir/lib/*"  \
--conf "spark.driver.extraJavaOptions=-Djava.security.krb5.conf=${current_dir}/config/krb5.conf -Djava.security.auth.login.config=${current_dir}/config/mssqldriver.conf" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file://${curent_dir}/log4j.properties " \
--class "my.AppMain" \
--files ${current_dir}/log4j.properties \
--master local[1] \
--driver-java-options "$DRIVER_JAVA_OPTIONS" \
--num-executors 4 \
--driver-memory 16g \
--executor-cores 10 \
--executor-memory 6g \
$current_dir/my-app-SNAPSHOT-assembly.jar

log4j properties:

log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

log4j.additivity.org=false

log4j.logger.org=WARN
parquet.hadoop=WARN
log4j.logger.com.barcap.eddi=WARN
log4j.logger.com.barcap.mercury=WARN
log4j.logger.yarn=WARN
log4j.logger.io.netty=WARN
log4j.logger.Remoting=WARN   
log4j.logger.org.apache.hadoop=ERROR

# this disables the table creation logging which is so verbose
log4j.logger.hive.ql.parse.ParseDriver=WARN

# this disables pagination nonsense when running in combined mode
log4j.logger.com.barcap.risk.webservice.servlet.PaginationFactory=WARN
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
user1615666
  • 3,151
  • 7
  • 26
  • 23
  • You're only specifying `-Dlog4j.configuration` on the executor. Is that intentional? – Yuval Itzchakov Feb 14 '17 at 15:45
  • 1
    I never had much luck with `-Dlog4j.configuration=` on Hadoop. Since "log4j.properties" is the default file name, just try to add the *directory* that contains the file in the driver CLASSPATH, and Log4J will find it. Or even simpler, drop your file in your $SPARK_CONF_DIR along with `spark-default.conf` and friends... – Samson Scharfrichter Feb 14 '17 at 16:55
  • 1
    first try to do this directly, i.e. use a very simply program and do spark-submit --dirver-java-options "-Dlog4j.configuration=file:///home/username/file.prop" without anything else. It should work. If not, it could be that your code has dependencies on other slf4j implementation and uses that implementation instead (which means it might be taking their properties). – Assaf Mendelson Feb 15 '17 at 06:51
  • checkout this-> https://stackoverflow.com/questions/27781187/how-to-stop-info-messages-displaying-on-spark-console/43747948#43747948 – Rahul Sharma Jul 08 '21 at 22:34

7 Answers7

18

Pay attention the Spark worker is not your Java application, so you can't use a log4j.properties file from the class-path.

To understand how Spark on YARN will read a log4j.properties file, you can use the log4j.debug=true flag:

spark.executor.extraJavaOptions=-Dlog4j.debug=true

Most of the time, the error is that the file is not found/available from the worker YARN container. There is a very useful Spark directive that allows to share file: --files.

--files "./log4j.properties"

This will make this file available from all your driver/workers. Add Java extra options:

-Dlog4j.configuration=log4j.properties

Et voilà!

log4j: Using URL [file:/var/log/ambari-server/hadoop/yarn/local/usercache/hdfs/appcache/application_1524817715596_3370/container_e52_1524817715596_3370_01_000002/log4j.properties] for automatic log4j configuration.
Thomas Decaux
  • 21,738
  • 2
  • 113
  • 124
16

How to pass local log4j.properties file

As I see from your script you want to:

  1. Pass local log4j.properties to executors
  2. Use this file for node's configuration.

Note two things about --files settings:

  1. Files uploaded to spark-cluster with --files will be available at root dir of executor workspace, so there is no need to add any path in file:log4j.properties.
  2. Files listed in --files must be provided with absolute path!

Fixing your snippet is very easy now:

current_dir=/tmp
log4j_setting="-Dlog4j.configuration=file:log4j.properties"

spark-submit \
...
--conf "spark.driver.extraJavaOptions=${log4j_setting}" \
--conf "spark.executor.extraJavaOptions=${log4j_setting}" \
--class "my.AppMain" \
--files ${current_dir}/log4j.properties \
...
$current_dir/my-app-SNAPSHOT-assembly.jar

Need more?

If you would like to read about other ways of configuring logging while using spark-submit, please visit my other detailed answer: https://stackoverflow.com/a/55596389/1549135

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Atais
  • 10,857
  • 6
  • 71
  • 111
5

Just to add, you can directly pass the conf via spark-submit, no need to modify defaults conf file

--conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///export/home/siva/log4j.properties

i ran below command, it worked fine

/usr/hdp/latest/spark2/bin/spark-submit --master local[*] --files ~/log4j.properties --conf spark.sql.catalogImplementation=hive --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///export/home/siva/log4j.properties ~/SCD/spark-scd-assembly-1.0.jar test_run

Note: If you have extra java options configured in conf file, just append and submit

shiv
  • 61
  • 1
  • 1
3
  1. Copy the spark-defaults.conf to a new app-spark-defaults.conf
  2. Add -Dlog4j.configuration=file://log4j.properties to the spark.driver.extraJavaOptions in the app-spark-defaults.conf. For example:

    spark.driver.extraJavaOptions -XXOther_flag -Dlog4j.configuration=file://log4j.properties

  3. Run your spark using --properties-file to the new conf file. For example :
    spark-submit --properties-file app-spark-defaults.conf --class my.app.class --master yarn --deploy-mode client ~/my-jar.jar

Ehud Lev
  • 2,461
  • 26
  • 38
1

Solution for spark-on-yarn

for me, run spark on yarn,just add --files log4j.properties makes everything ok.
1. make sure the directory where you run spark-submit contains file "log4j.properties".
2. run spark-submit ... --files log4j.properties

let's see why this work

1.spark-submit will upload log4j.properties to hdfs like this

20/03/31 01:22:51 INFO Client: Uploading resource file:/home/ssd/homework/shaofengfeng/tmp/firesparkl-1.0/log4j.properties -> hdfs://sandbox/user/homework/.sparkStaging/application_1580522585397_2668/log4j.properties

2.when yarn launches containers for driver or executor,yarn will download all files uploaded into node's local file cache, including files under ${spark_home}/jars,${spark_home}/conf and ${hadoop_conf_dir} and files specified by --jars and --files.
3.before launcher container, yarn export classpath and make soft links like this

export CLASSPATH="$PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*

ln -sf "/var/hadoop/yarn/local/usercache/homework/filecache/1484419/log4j.properties" "log4j.properties"
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
  exit $hadoop_shell_errorcode
fi
ln -sf "/var/hadoop/yarn/local/usercache/homework/filecache/1484440/apache-log4j-extras-1.2.17.jar" "apache-log4j-extras-1.2.17.jar"

4.after step3, "log4.properties" is already in CLASSPATH, no need for setting spark.driver.extraJavaOptions or spark.executor.extraJavaOption.

shao
  • 15
  • 3
0

Be aware that spark 3.3.0 switched to log4j2. Which means you have to configure things differently.

Steven
  • 2,050
  • 23
  • 20
-3

If this is just for a self-learning project or small development project, There is already a log4j.properties in hadoop_home/conf. Just edit that one, add your own loggers

Jake
  • 4,322
  • 6
  • 39
  • 83
  • 8
    In most installations person running the job is not the same person who control the hadoop_home – Krever Nov 15 '17 at 13:19