14

In kafka I have set retention policy to 3 days in server.properties

############################# Log Retention Policy #############################
...
log.retention.hours=72
...

Topics has retention.ms set to 172800000 (48h).

However, there are still old data in the folder /tmp/kafka-logs and none are being deleted. I waited few hours after changing those properties.

Is there something that needs to be set? All topics are being produced to and consumed from currently.

simPod
  • 11,498
  • 17
  • 86
  • 139

3 Answers3

10

Edit: it seems the cleanup.policy should default to delete according to kafka documentation.

retention.ms or retention.bytes specify the way the data are deleted.


The key was to set log.cleanup.policy to compact or delete. I had not set this.

Running: kafka-topics --zookeeper 127.0.0.1:2181 --topic topic1 --describe shows properties set on topic, eg. Configs:retention.ms=172800000,cleanup.policy=compact

The cleanup.policy has to be set. Also I manually set retention.ms / retention.bytes to control cleanup trigger.

simPod
  • 11,498
  • 17
  • 86
  • 139
8

Compact policy will only compact values from a key. That is, it will eventually trigger compaction processes that will leave only one (the final) value for a key. But not delete the last value, ever.

In order to trigger deletion by time, you need to set delete policy. In that case, the deletion process will delete data older than the given one.

However you can set up policy as compact,delete to take advantage both processes over the same topic (not available in earlier versions).

However, these processes are not second-exact: they will be triggered eventually following some conditions, like:

# The interval at which log segments are checked to see 
# if they can be deleted according to the retention policies
log.retention.check.interval.ms=300000

(check more conditions on Kafka documentation and then will guarantee that data older than the threshold will be deleted.

In addition, there are different settings for different time granularity, and they have priorities (if one is set, it will ignore the next). Make sure there is no unexpected overriding. Please check the comprehensive documentation for details.

xmar
  • 1,729
  • 20
  • 48
  • `log.retention.check.interval.ms` gives "Unknown topic config name: log.retention.check.interval.ms" – GuanacoBE Feb 20 '20 at 07:18
  • 2
    This is broker config. Maybe you're setting it up somewhere else? You can find details here: https://kafka.apache.org/documentation/#brokerconfigs – xmar Feb 20 '20 at 11:19
  • Ok my bad, I was searching in Topic Configuration – GuanacoBE Feb 20 '20 at 11:27
2

As described in kafka retention policy didn't work as expected :

Log retention is based on the creation date of the log file. Try setting your log.roll.hours < 24 (since [by default][1] it is 24 * 7).

Devstr
  • 4,431
  • 1
  • 18
  • 30
  • I set this property but has not effect. I have now logs older than 7*24 so I suppose this is not it. I guess kafka doesn't trigger the purge at all. – simPod Feb 12 '18 at 12:38
  • I noticed the topic `consumer offsets` was being deleted but my topics were not. Did the compare and it was caused by not setting cleanup policy, see my answer. But thanks anyway! – simPod Feb 12 '18 at 12:51