After specifying a configuration file in a spark-submit
as in this answer:
spark-submit \
--master local \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties"\
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties"\
--py-files ./dist/src-1.0-py3-none-any.whl\
--files "/job/log4j.properties"\ # path in docker container
main.py -input $1 -output $2 -mapper $3 $4 # app args
With the dockerized application structure being:
job/
|-- entrypoint.sh
|-- log4j.properties
|-- main.py
I'm getting the following error:
log4j:ERROR Ignoring configuration file [file:/log4j.properties].log4j:ERROR Could not read configuration file from URL [file:/log4j.properties].
java.io.FileNotFoundException: /log4j.properties (No such file or directory)
It works fine if I set the configuration from the spark context method: PropertyConfigurator.configure
:
logger = sc._jvm.org.apache.log4j.Logger
sc._jvm.org.apache.log4j.PropertyConfigurator.configure("/job/log4j.properties")
Logger = logger.getLogger("MyLogger")
That is, all spark INFO
level logging is silenced, and I only see warnings and error logs, which is what I've set in the configuration file. However, if I just instanciate a logger as (desirable behaviour):
log4jLogger = sc._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger("MyLogger")
It isn't behaving as at it does setting it via PropertyConfigurator.configure
, which I've set to silence all spark INFO
level loggings. Any idea on how to use the logging configuration set in the spark-submit
to control the application's logs?
Using pyspark with a spark version 3.0.1
and python 3.8.0
.