2

Actually I am having a doubt whether it is possible to run a Hadoop/Spark job without generating Logs.

For example, I should be able to trigger a Spark job using Spark submit but it should not store any logs in any location i.e., our Resource manager or Spark History Server should not be able show any information related to the application. Even information like whether the application is successful or not like that. If that could be done, I would request to provide similar info for Mapreduce job and Hive on Tez job as well.

I googled, but could not find any information on this.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • I am not aware of how to make applications invisible in YARN resource manager or history server. But for application logs, it is possible to configure, see https://stackoverflow.com/questions/36173601/dataproc-how-do-i-configure-spark-driver-and-executor-log4j-properties – Dagang Sep 14 '22 at 21:42

1 Answers1

0

Each component (YARN containers, NodeMananger process, ResourceManger process, History Server) is configured with its own log4j property file. You'll need to override each to set their log levels to OFF to "hide" logs.

Keep in mind that you will probably want logs to debug why an application had failed for any reason.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245