I have a scenario where a particular log message might get printed a lot of times (may be in millions). For example, if we log (using logger.warn()
method)for every record with the missing field(s), we might end up logging a lot-cases where input file has a lot of records with missing fields(for example, large files on HDFS). This quickly fills up the disk space.
To avoid this situation, I am trying to log once for every (for example) 1000 records with missing fields. I can implement all of this logic outside of the log4j package, but I was wondering if there is a cleaner way to do this. Ideally, all of this logic would go into the log4j code.
This seems like a commonly encountered problem, but there is hardly any info on this. Any thoughts?