Updated in Q3 2022
Default config
The default log4j config for Spark on Dataproc is available at /etc/spark/conf/log4j.properties
. It configures root logger to stderr at INFO level. But at runtime driver logs (in client mode) will be directed by the Dataproc agent to GCS and streamed back to the client, and executor logs (and driver logs in cluster mode) will be redirected by YARN to the stderr
file in the container's YARN log dir. See this answer for how to get YARN container logs of Dataproc.
Consider using /etc/spark/conf/log4j.properties
as the template for your custom config, and keep using console as the target for your log.
Cluster level
If you want to configure Spark driver and executor logs at cluster level, the simplest way is to add --properties spark-log4j:<key>=<value>,...
when creating the cluster. The properties from the flag will be appended to /etc/spark/conf/log4j.properties
which will be used as the default log4j config for all Spark drivers and executors in the cluster. Or you can write an init action to update the file.
Job level
You can also configure Spark driver and/or executor logs at job level when submitting the job with either of the following ways:
--driver-log-levels
(for driver only), for example:
gcloud dataproc jobs submit spark ...\
--driver-log-levels root=WARN,org.apache.spark=DEBUG
--files
. If the driver and executor can share the same log4j config, then gcloud dataproc jobs submit spark ... --files gs://my-bucket/log4j.properties
will be the easiest. Note that the file name should be exactly log4j.properties
, so it can override the default one.
--files
and --properties spark.[driver|executor].extraJavaOptions=-Dlog4j.configuration=
(for both driver and executor). Note that -Dlog4j.configuration
should be set to file:<filename>
because the files will be present in the working directory of the YARN container for driver/executor.
gcloud dataproc jobs submit spark ... \
--files gs://my-bucket/driver-log4j.properties,gs://my-bucket/executor-log4j.properties \
--properties 'spark.driver.extraJavaOptions=-Dlog4j.configuration=file:driver-log4j.properties,spark.executor.extraJavaOptions=-Dlog4j.configuration=file:executor-log4j.properties'
See also https://spark.apache.org/docs/latest/running-on-yarn.html#debugging-your-application