We have multiple Apache Spark jobs and we need to log some events and parameters of task executing for debugging and troubleshooting purposes.
What are the practices of logging at Apache Spark job code?
The obvious solutions are: use either Spark's loginfo(and other methods, though that's not recommended), some logging framework (like log4s) or simple println.
In my Java developer background, I feel a bad practice to write log into console directly. We always used logging frameworks for it.
But if we choose println logging for spark job, we'll have the simple ability to collect the log into file redirecting out to file from the starting shell script, for example. Moreover, we could see the output in the spark admin console.
So I have no idea of the profit we get using log4s. Could you share the pros and cons using println for logging inside spark job?