0

We have multiple Apache Spark jobs and we need to log some events and parameters of task executing for debugging and troubleshooting purposes.
What are the practices of logging at Apache Spark job code?
The obvious solutions are: use either Spark's loginfo(and other methods, though that's not recommended), some logging framework (like log4s) or simple println.
In my Java developer background, I feel a bad practice to write log into console directly. We always used logging frameworks for it.
But if we choose println logging for spark job, we'll have the simple ability to collect the log into file redirecting out to file from the starting shell script, for example. Moreover, we could see the output in the spark admin console.

So I have no idea of the profit we get using log4s. Could you share the pros and cons using println for logging inside spark job?

MaSEL
  • 505
  • 1
  • 5
  • 20

2 Answers2

0

Spark uses log4j as the standard library for its own logging. Everything that happens inside Spark gets logged to the shell console and to the configured underlying storage. Spark also provides a template for app writers so we could use the samelog4j libraries to add whatever messages we want to the existing and in place implementation of logging in Spark.

Please have a look at this.

For using or not using println for loggin, in my personal experience i would say NO refer this link.

After the job is finished you can collect the logs using YARN from the job history server. For a more detailed answer look at this

Community
  • 1
  • 1
iec2011007
  • 1,828
  • 3
  • 24
  • 38
0

I would recommend you to use Log4J directly what any second thought. You can add DEBUG level, INFO level and ERROR level of loggers inside your code. And can use some of the best practices of logging like

1) Separate paths for logging all levels of logs.

2) Rollover policy for the logging

3) Required packages to log or not. Like in my case I use Spark on AWS so I also enable S3 related logs to monitor the files or folders it is scanning or working on. etc.

Murtaza Kanchwala
  • 2,425
  • 25
  • 33