3

Trying to understand the logic behind retention period in Apache Kafka. Please help me to understand the situation for the below scenarios.

  1. If retention period is set as 0, what will happen? Will all records be deleted?
  2. If we delete the retention parameter itself, will it take the default value?
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245

1 Answers1

7
  1. Kafka doesn't allow you to set the retention period as zero, in units of hours. It has to be at-least 1. In case, you set it to zero, you'll get the following error message, and the broker won't start.

java.lang.IllegalArgumentException: requirement failed: log.retention.ms must be unlimited (-1) or, equal or greater than 1

You can still set it to zero while using the parameters log.retention.minutes or log.retention.ms

  • Now, let's come to the point of data deletion. In this situation, the old data won't likely get deleted even after the set retention (say 1 hr, or 1 min) has expired, because one more variable in server.properties called log.segment.bytes plays a major role there. The value of log.segment.bytes is set to 1GB by default. Kafka only performs deletion on a closed segment. So, once a log segment has reached 1GB, only then it is closed, and only after that the retention kicks in. So, you need to reduce the size of log.segment.bytes to some approximate value which is atmost the size of the cumulative investion volume of the data that you are planning to retain for that short duration. E.g. if your retention period is 10 min, and you get roughly 1 MB of data per minute, then you can set the log.segment.bytes=10485760 which is 1024 x 1024 x 10. You can find an example of how retention is dependent both on the data ingestion and time in this thread.

  • To test this, we can try a small experiment. Let's start Zookeeper and Kafka, create a topic called testand change its retention period to zero.

    1) nohup ./zookeeper-server-start.sh ../config/zookeeper.properties &
    2) nohup ./kafka-server-start.sh ../config/server.properties &
    3) ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
    4) ./kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name test --alter --add-config log.retention.ms=0
    
  • Now if we insert sufficient records using Kafka-console-producer, even after 2-3 minutes, we'll see the records are not deleted. But now, let's change the log.segment.bytes to 100 bytes.

    5) ./kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name test --alter --add-config segment.bytes=100 
    
  • Now, almost immediately we'll see that old records are getting deleted from Kafka.

  1. Yes. As it happens with every Kafka parameter in server.properties, if we delete/comment out a property, the default value for that property kicks in. I think, the default retention period is 1 week.
Bitswazsky
  • 4,242
  • 3
  • 29
  • 58
  • 1
    According to documentation retention time is 1 week (property `log.retention.hours=168`. It can be change on broker site, but different value can also be set for each topic – Bartosz Wardziński Jan 05 '19 at 10:43
  • @wardziniak I already mentioned the default retention period in the answer, but I think that adding info about the topic specific params would make the answer complete. I'll update my answer with more details. Thanks. – Bitswazsky Jan 05 '19 at 11:06
  • "even after 2-3 minutes, we'll see the records are not deleted" -- This is because the segment file is still active, right? – OneCricketeer Jan 05 '19 at 19:35
  • 1
    @cricket Yes, that's the reason. – Bitswazsky Jan 06 '19 at 05:29