5

I wanted a certain kafka topic to only keep 1 day of data. But it didn't seem to delete any data at all if we keep sending data to the topic(active). I tried topic side parameter (retention.ms) and server side:

    log.retention.hours=1 or log.retention.ms= 86400000 
    cleanup.policy=delete

But it didn't seem to work for alive topics, if we keep sending data to it. Only when we stop sending data to the topic, it will follow the retention policy.

So, what's the right config for a active topic, to retain data only for some time?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
user3502577
  • 99
  • 2
  • 7

2 Answers2

4

Kafka deletes only the passive log segments. You have to tune either log.segment.bytes or log.roll.ms to roll the active log segment into passive one. Refer the Broker configuration for more information.

Kamal Chandraprakash
  • 1,872
  • 18
  • 28
1

Log retention is based on the creation date of the log file. Try setting your log.roll.hours < 24 (since by default it is 24 * 7).

For 0.8

If you only want to control log file creation per topic, set log.roll.hours.per.topic in the topic config.

for 1.0

Logs are segmented and the per topic config for log segments is:

segment.ms Note: this is in millseconds, and overrides the server-wide setting of log.roll.ms.

See also: Purge Kafka Topic

cowbert
  • 3,212
  • 2
  • 25
  • 34
  • Thanks for the reply. log.roll.hours helps from the server side. But if I set "log.roll.hours.per.topic", would it apply to all topics? Is there a way to define individual topic to have different roll.hour? – user3502577 Jan 25 '18 at 21:26
  • http://kafka.apache.org/documentation/#topicconfigs for 0.11 the per topic config is `segment.ms` (which overrides to the `log.roll.ms` serverwide) – cowbert Jan 25 '18 at 22:02