The surest way to prevent this is to increase offsets.retention.minutes
on the broker from its default 24 hours. It should be set to something longer than the period for which any consumer might be down before there are more pressing concerns than the offset being reset. In many cases, you can set this for a period on the order of hundreds of days: it's hard to imagine a consumer that is simultaneously
- so important that its offsets can't be reset
- so unimportant that it not consuming for hundreds of days would go unnoticed and unaddressed
The consumer offset commit messages are themselves so small that retaining them for hundreds of days is unlikely to cause a problem: the topic they're in is also compacted (and if it's not getting compacted, you have bigger problems, like consumers being taking minutes to find their offsets).
If you can't get offsets.retention.minutes
set (e.g. due to the Kafka brokers in question being owned by a different team which is unresponsive to your concerns), then you will have to treat every consumer that is so important that its offsets can't be reset as a consumer that can't ever be in a not-consuming state for 24 hours. This may entail reserving budget for having 24x7 on-call for keeping that consumer active (or cutting over to a dummy consumer which consumes but never commits: in the modern Kafka protocols that will prevent the offset from being lost).