1

I use AWS MSK cluster with brokers logging turned on to CloudWatch. Logging works and I can see brokers logs. We have some topics with cleanup.policy=compact and some with cleanup.policy=delete. The system is running on the new cluster for about 2 weeks now.

From my research (e.g. https://zendesk.engineering/an-investigation-into-kafka-log-compaction-5e520f4291f0) I see that kafka should run log cleaner (obviously) and there should be some traces in logs of this activity. However in my CloudWatch log group I cannot find a word "cleaner" or "cleaned" and I cannot find any trace of log cleaner running.

Is log cleaner running at all? It obviously should but I can't find anything in the logs to confirm this, and also we have a lot of messages eligible for cleanup but still not cleaned, for about 2 weeks now.

Kafka cluster version is 2.8.1

amorfis
  • 15,390
  • 15
  • 77
  • 125
  • 1
    1) The cleaner only runs on closed log segments, not just on a timed bases. Has there been enough data written to close any segments? 2) Is the log cleaner disabled in the broker settings? Also, there's multiple JIRA issues that have been reported about those threads not running properly – OneCricketeer May 09 '22 at 13:47
  • 1
    @OneCricketeer the problem is we are producing a lot of data and are running out of brokers disk space (few hundreds GB), so I think it is enough and there should be many closed segments. Broker settings are pretty much, only with max.message.size set to 15 mb, and delete.topic.enable = true – amorfis May 09 '22 at 14:48
  • 1
    * broker settings are pretty much default. (Why I can't edit comment after 50 minutes?) – amorfis May 09 '22 at 15:40
  • 1
    `log.cleaner.threads` defaults to 1. Perhaps you should try adding more? – OneCricketeer May 09 '22 at 17:56
  • 1
    Unfortunately I can't set `log.cleaner.threads` on MSK :/ – amorfis May 10 '22 at 13:54

1 Answers1

4

It is quite likely these logs are not being show in MSK since it seems that, by default, they do not go to the main log stream, from: https://jaceklaskowski.gitbooks.io/apache-kafka/content/kafka-log-LogCleaner.html

Please note that Kafka comes with a preconfigured kafka.log.LogCleaner logger in config/log4j.properties:

log4j.appender.cleanerAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.cleanerAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.cleanerAppender.File=${kafka.logs.dir}/log-cleaner.log
log4j.appender.cleanerAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.cleanerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n

log4j.logger.kafka.log.LogCleaner=INFO, cleanerAppender
log4j.additivity.kafka.log.LogCleaner=false

That means that the logs of LogCleaner go to logs/log-cleaner.log file at INFO logging level and are not added to the main logs (per log4j.additivity being off).

It is a bit misleading though because the LogCleaner takes care of compacted topics, I'm not sure where is logged (or at which log level since AWS MSK only exports INFO level logs) the deletion of messages in topics with delete cleanup policy.

I would contact AWS support to know if there is a way or to know what do they do with these logs.

Alternatively, you could try to set up open monitoring with Prometheus which will get all metrics exported by Kafka to JMX. If enabled, there should be a metric (max-clean-time-sec) that, at least, will tell you if it is running and you may get some other interesting information to troubleshoot your issue.

Gerard Garcia
  • 1,554
  • 4
  • 7
  • 1
    And if you are still unable to see if the log cleaner is working, you can check out if you are having the problem described in this answer https://stackoverflow.com/a/66532893/19059974 – Gerard Garcia May 17 '22 at 06:24
  • 1
    I have already checked this answer and it's not my case. Besides we have 7 days retention so few hours wouldn't make much difference. – amorfis May 17 '22 at 11:38