0

I'm running Kafka (version 0.10.2) with Spring-data (version 1.5.1.RELEASE), Spring-kafka (version 1.1.1.RELEASE).

I have a topic which one consumer group is polling from. I noticed that sometimes, when one consumer restarts, the topic's lag turns instantly to a much higher number. After some research I came to conclusion that Kafka restarting the offsets, but I can't understand why.

enable.auto.commit = true
auto.commit.interval.ms = 5000
auto.offset.reset = smallest
log.retention.hours=168

The lag is usually very low (below 500) and being consumed in a few ms, so it can't be a out of range index (or can it?)

Someone have an idea maybe?

Yuval
  • 764
  • 1
  • 9
  • 23

1 Answers1

0

I don't think it's actually committing the offsets as frequently as you expect, therefore, when a consumer restarts, the group rebalances, then picks up at the most recent auto-committed offset.

Commits happen only periodically (5 seconds, per your config), not on a message-per-message basis. Thus, it should be expected to see at most 5 seconds worth of duplicated data, but not the beginning of the topic, unless offsets are not being committed at all (you should setup simple log4j logging in the clients in order to determine this)

If you want finer control, disable auto offset commits, and call the commitSync or commitAsync methods of the Consumer object (these are the methods of the core Java API, not sure about Spring)

One option might be to upgrade your Spring clients like Gary is saying below. Since you're running Kafka 0.10.2+, this shouldn't be a problem.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Nope, definitely the same messages are being read again. Messages from a week ago – Yuval Nov 19 '18 at 07:49
  • 1
    You should upgrade to 1.3.x; 1.1.x is very old and has a very complicated threading model. KIP-62 allowed us to rewrite the threading model and make it much simpler. The current 1.3.x release is 1.3.7; 1.3.8 will be released next week. With brokers before 2.0.0, the consumer offsets are removed after 24 hours so if you get no messages in that time (e.g. over a weekend); the offsets will be reset. 2.0.0 changed the default to 7 days. – Gary Russell Nov 19 '18 at 15:13
  • @Yuval See above ^^ – OneCricketeer Nov 19 '18 at 16:55
  • Thanks @GaryRussell. Where can I find a documentation for this behavior? How can I configure it? – Yuval Nov 20 '18 at 09:56
  • You can read more about the offset retention change here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-186%3A+Increase+offsets+retention+default+to+7+days – kkflf Nov 20 '18 at 11:19
  • Also, ´enable.auto.commit´ commit the offset on each poll and not only periodically as defined by ´ auto.commit.interval.ms´. So it will commit both on poll and after 5 seconds. I assume this is to prevent the offset from expiring due to exceeding ´offsets.retention.minutes´, maybe somebody can confirm this? – kkflf Nov 20 '18 at 11:22
  • 2
    Good point; with auto commit; the offsets shouldn't expire. I would still recommend upgrading to a more modern version of spring-kafka, though. 1.1.x is no longer supported; you should go to 1.3.7 at a minimum; the current vesrsion is 2.1.0 (2.1.1 next week). – Gary Russell Nov 20 '18 at 14:01
  • @Gary I assume there's a changelog for keeping track of all these releases features/fixes? – OneCricketeer Nov 20 '18 at 14:22
  • We started updating the [GitHub releases](https://github.com/spring-projects/spring-kafka/releases) this year; the [reference manual](https://docs.spring.io/spring-kafka/reference/html/) has a "what's new" for the current release and an appendix for changes in older releases. The [project page](https://spring.io/projects/spring-kafka) has a matrix showing versions and compatibility. – Gary Russell Nov 20 '18 at 14:38