0

This question has answers related to how to do this on a YARN cluster. But what if I am running a standalone spark cluster? How can I log from executors? Logging from the driver is easy using the log4j logger that we can derive from spark-context.

But how can I log from within an RDD's foreach or a foreachPartition? Is there any way I can collect these logs and print?

void
  • 2,403
  • 6
  • 28
  • 53
  • More or less the same way you do on the driver I guess. If you e.g. have a FileAppender to a local path configured this file will be created on the workers of your cluster. If you just configure a STDOUT appender this will all end at the stdout of your worker. Or where you asking how to actually log from `foreach` (etc.) operations? – TobiSH Mar 29 '18 at 09:43
  • Vis-a-vis the accepted answer @ the link you mention in your Q, the only line you would need to change is `spark.sparkContext.addPyFile('hdfs:///path/to/logger.py')`. The path needs to be a common path (such as NFS) or a path that exists/can be created on all executers. Also, go through the comments there, since some env's like LOG_DIRS needs to be set. This is being used within the logger.py code. – sujit Mar 29 '18 at 10:08

1 Answers1

0

The answer to this is to import python logging and to write the messages using logging and the logged messages will be in the work directory which is created under the spark installation location
There is nothing else which is needed
I went crazy modifying log4j.properties file and adding driver-java-option and spakrk.executor.extraJavaOptions
In your spark program, import logging add log messages straightaway as logging.warning(whatever is your message and variable values you want to check)
Then if you navigate to the work directory - if i have installed spark at /home/vagrant/spark then we are talking about /home/vagrant/spark/work directory
There will be a directory for each application.
And the workers used for the application will have numbers 0, 1, 2, 3 etc.
You have to check in each worker.
And whichever worker your executor was created to execute the task in the stderr you will see the logging messages
Hope this helps to see the user logged messages on the executor when using the spark standalone cluster mode

samarkant
  • 1
  • 2