0

In my system, the offset is initialized if there is no offset commit for 24 hours due to the kafka retention period setting.

Is there a good way to avoid this offset initialized?

It is set as the oldest offset of the producer as the consumer rejoins the consumer group. And this duplicate consumer will generate an exception and raise a deadletter.

One idea is to generate dummy data in the producer. Are there any best practices related to this? Or is there a better way than this?

kk jj
  • 13
  • 4
  • What do you mean by initialized? Set to the beginning? i.e. the earliest offset/the oldest message in the topic? – Noam Levy Apr 06 '22 at 07:44
  • It means that the offset of the consumer is set as the oldest offset of the producer when the consumer rejoins the consumer group. This is because the offset is reset if there is no offset commit for 24 hours by the kafka retention period setting. – kk jj Apr 06 '22 at 07:51
  • Can you share your settings for `auto.offset.reset` and the retention period? – Noam Levy Apr 06 '22 at 07:57
  • It is set to earliest – kk jj Apr 06 '22 at 08:01
  • 1
    Why There are no commits? Is the consumer group down for 24 hours? – Noam Levy Apr 06 '22 at 08:17
  • Does this answer your question? [How does an offset expire for an Apache Kafka consumer group?](https://stackoverflow.com/questions/39131465/how-does-an-offset-expire-for-an-apache-kafka-consumer-group) – Chin Huang Apr 06 '22 at 08:24

1 Answers1

0

The surest way to prevent this is to increase offsets.retention.minutes on the broker from its default 24 hours. It should be set to something longer than the period for which any consumer might be down before there are more pressing concerns than the offset being reset. In many cases, you can set this for a period on the order of hundreds of days: it's hard to imagine a consumer that is simultaneously

  • so important that its offsets can't be reset
  • so unimportant that it not consuming for hundreds of days would go unnoticed and unaddressed

The consumer offset commit messages are themselves so small that retaining them for hundreds of days is unlikely to cause a problem: the topic they're in is also compacted (and if it's not getting compacted, you have bigger problems, like consumers being taking minutes to find their offsets).

If you can't get offsets.retention.minutes set (e.g. due to the Kafka brokers in question being owned by a different team which is unresponsive to your concerns), then you will have to treat every consumer that is so important that its offsets can't be reset as a consumer that can't ever be in a not-consuming state for 24 hours. This may entail reserving budget for having 24x7 on-call for keeping that consumer active (or cutting over to a dummy consumer which consumes but never commits: in the modern Kafka protocols that will prevent the offset from being lost).

Levi Ramsey
  • 18,884
  • 1
  • 16
  • 30