3

I have tried creating a Kafka topic configuration that uses compaction and deletion, to achieve the following:

  • Within the retention period, retain the latest version of the key
  • After the retention period, any message older than the timestamp to be removed

For this, I have tried the following topic specific config:

cleanup.policy=[compact,delete]
retention.ms=864000000 (10 days)
min.compaction.lag.ms=3600000 (1 hour)
min.cleanable.dirty.ratio=0.1
segment.ms=3600000 (1 hour)

The broker configuration is as following:

log.retention.hours=7 days
log.segment.bytes=1.1gb
log.cleanup.policy=delete
delete.retention.ms=1 day

When I set this to a smaller amount in test, e.g. 20mins, 1hr etc, I can correctly see the data is pruned after the retention period, only adjusting retention.ms on the topic.

I can see that the data is correctly being compacted as expected, but after the 10 day retention period if I read the topic from the beginning, data much older than 10 days is still there. Is this a problem with such a long retention period?

Am I missing any configuration here? I have checked the kafka logs and see the broker is rolling the segments and compacting as expected, but can't see anything about deletes?

Kafka Version is 5.1.2-1

Hux
  • 3,102
  • 1
  • 26
  • 33
  • If this helps: [Similar problem](https://stackoverflow.com/questions/48746136/kafka-doesnt-delete-old-messages-in-topics) – Pratik May 26 '20 at 09:37
  • @pratikmishra I read that, as far as I know `log.roll.hours` is at broker level and `segment.ms` should cover it at topic level? – Hux May 26 '20 at 09:49
  • Hoping you dont have any issues with tombstone records created because of compaction as delete.retention.ms if not configured set to 1 day. Is it issue with records which didnt had any duplicate keys inserted ? Also, did you check the cleaner logs after retention time 10 days ? – Surendra S May 26 '20 at 14:24
  • @Surendra I have checked the broker config and `delete.retention.ms` is `1 day`, I am seeing the correct behaviour for compacted records, e.g. for a message with the same key only one is retained. For messages over the 10 days, these messages currently seem to be retained indefinitely. – Hux May 26 '20 at 14:48
  • This config has been on the cluster for around 12 days, I would have expected it to remove the data at maximum on the 10th day, but nothing seems to have been actioned? What would I search for on the logs? – Hux May 26 '20 at 14:57
  • Can you double check what the `cleanup.policy` of your topic actually is using the `kafka-topics --describe` command? According to [KIP-71](https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+Enable+log+compaction+and+deletion+to+co-exist) the policy should be defined like this: `cleanup.policy=compact,delete` (without "[" and "]"). – Michael Heil May 26 '20 at 15:30
  • @mike sorry it is compact, delete. I copied the cmd use for kafka-topic.sh which uses [] to define a list. It's definitely configured with `compact, delete` – Hux May 26 '20 at 16:00

1 Answers1

0

It might be the case that your topic and broker configuration override each other and eventually one with higher importance is evaluated.

misterbaykal
  • 522
  • 5
  • 15