Following up on this question - I would like to know semantics between consumer-groups and offset expiry. In general I'm curious to know, how kafka protocol determines some specific offset (for consumer-group, topic, partition combination) to be expired ? Is it basing on periodic commits from consumer that are part of the group-protocol or does the offset-tick
gets applied after all consumers are deemed dead/closed ? Im thinking this could have repercussions when dealing with topic-partitions to which data isn't produced frequently. In my case, we have a consumer-group reading from a fairly idle
topic (not much data produced). Since, the consumer-group doesnt periodically commit any offsets, can we ever be in danger of loosing previously committed offsets. For example, when some unforeseen rebalance happens, the topic-partitions could get re-assigned with lost offset-commits and this could cause the consumer to read data from the earliest (configured auto.offset.reset)
point ?
Asked
Active
Viewed 1,202 times
0

OneCricketeer
- 179,855
- 19
- 132
- 245

V1666
- 185
- 3
- 14
1 Answers
0
For user-topics, offset expiry / topic retention is completely decoupled from consumer-group offsets. Segments do not "reopen" when a consumer accesses them.
At a minimum, segment.bytes
, retention.ms
(or minutes/hours), retention.bytes
all determine when log segments get deleted.
For the internal __consumer_offsets
topic, offsets.retention.minutes
controls when it is deleted (also in coordination with its segment.bytes
).
The LogCleaner thread actively removes closed segments on a periodic basis, not the consumers. If a consumer is lagging considerably, and upon requesting offsets from a segment that had been deleted, then the auto.offset.reset
gets applied.

OneCricketeer
- 179,855
- 19
- 132
- 245
-
So, you are saying offsets can expire due to other configs like `segment.bytes` and `retention.ms` ? ....I would also like to confirm if consumers (that are part of `group protocol` and are started via `consumer.subscribe()` mechanism) polling from idle topic-partitions can loose offsets to `offsets.retention.minutes` - due to lack of periodic commits, since no data is being retuned by `poll`. – V1666 Mar 08 '22 at 00:18
-
Polling doesn't update the offsets topic, only commits do. But, yes, for `__consumer_offsets` that is true. I thought you meant user-topics. – OneCricketeer Mar 08 '22 at 01:02
-
Potential reasons to loose committed offsets - segment, retention configurations of `__consumer_offsets` topic. Just to confirm, can lack of commits to user-specific topics lead to expiry of previously committed offsets (with `offsets.retention.minutes` coming into picture). Lets consider a hypothetical scenario where we have a super large topic but to which data gets published rarely, lets assume that there are always active consumers (part of same consumer-group) polling from this topic. – V1666 Mar 08 '22 at 01:10
-
If the consumer group has low lag, then yes, the head of the topic can start getting dropped. But those groups are always reading the tail of the topic and shouldnt care – OneCricketeer Mar 08 '22 at 01:15