27

The document https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch04.html says that "Note that with auto-commit enabled, a call to poll will always commit the last offset returned by the previous poll. It doesn’t know which events were actually processed, so it is critical to always process all the events returned by poll before calling poll again (or before calling close(), it will also automatically commit offsets)". If that's the case how does it work if auto.commit.interval.ms is larger than the time if takes to process the messages received from previous poll().

To make it more concrete, consider the scenario where I have following:

enable.auto.commit=true

auto.commit.interval.ms=10

And I call poll() in a loop.

1) On first call to poll(), I get 1000 messages (offset 2000-3000) and it takes 1 ms to process all 1000 messages

2) I call poll() again. In this 2nd poll() call, it should commit the latest offset 3000 returned from previous poll() but since auto.commit.interval.ms is set to 10 ms, it won't commit the offset yet, right?

In this scenario, the committed offset will lag further and further behind the latest offset that was actually processed?

Could someone clarify/confirm?

tourist
  • 4,165
  • 6
  • 25
  • 47
Deeps
  • 1,879
  • 4
  • 18
  • 18

1 Answers1

30

You describe the behavior correctly. However, you conclusion is not correct. The committed offset will not lag further and further. After auto-commit interval passed, the next call to poll will commit all processed messages.

Let's say, you call poll each 10 ms, and set commit-interval to 100ms. Thus, in every 10th call to poll will commit (and this commit covers all messages from the last 10 poll calls).

Dmitry Minkovsky
  • 36,185
  • 26
  • 116
  • 160
Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • You are right. Eventually 2 offsets (committed and latest processed) are synched up but until then committed offset will keep falling behind. – Deeps Jul 13 '16 at 17:54
  • 2
    Between two commits yes. This is intended behavior to reduce the number of commits Kafka has to process. Recall that Kafka provides at-least-once delivery guarantee. Thus, it's a tradeoff between number of commits and how much data needs to be reprocessed in case of failure. – Matthias J. Sax Jul 13 '16 at 18:02
  • what if poll call is after 100ms and set auto-commit-interval to 10ms. – Manish Jaiswal Oct 17 '17 at 11:40
  • 2
    In KafkaStreams, consumer `auto.commit` is disabled, and the library does manual commits. – Matthias J. Sax Oct 18 '17 at 15:42
  • @ManishJaiswal Since *commit()* is a post-*poll()* call, it will trigger commit after 100ms (in your scenario). [This](https://stackoverflow.com/a/46547165) elaborates on it. – CᴴᴀZ Sep 20 '18 at 10:57
  • @Matthias J. Sax With `auto.commit.interval.ms=10` if processing all messages takes 1ms than next `poll()` will not bring new records right? when 10ms will be passed and we make new `poll()` we will get next batch right? – Anup Jan 20 '20 at 06:20
  • How often `poll()` is called is independent of `commit.interval.ms` config. Hence, if you do a poll and processing all those messages takes 1ms, there will be a second poll(), even if not commit happens. – Matthias J. Sax Jan 21 '20 at 17:23
  • As far as I understand, the behavior 'commit on poll' is different across different clients. As some clients also do commits in the background.. at lest that's the way I read https://docs.confluent.io/platform/current/clients/consumer.html#message-handling – Ant Feb 16 '23 at 22:35
  • Could be. The question was about the Java client that ships with Apache Kafka though. -- Note, Apache Kafka only ships Java clients. Other clients are not part of Apache Kafka. – Matthias J. Sax Feb 16 '23 at 22:45