0

I'm learning Kafka and if someone could help me to understood one thing. "Producer' send message to Kafka topic. It stays there some time (7 days by default, right?).

But "consumer" receives such message and there is not much sense to keep it there eternally. I expected that these messages disappear when consumer gets them. Otherwise, when I connect to Kafka again, I will download the same messages again. So I have to manage duplicate avoidance.

What's the logic behind it?

Regards

Gosforth
  • 41
  • 8

1 Answers1

1

"Producer" send message to Kafka topic. It stays there some time (7 days by default, right?).

Yes, a Producer send the data to a Kafka topic. Each topic has its own configurable cleanup.policy. By default it is set to a retention period of 7 days. You can also configure the retention of the topic based on byte size.

But "consumer" receives such message and there is not much sense to keep it there eternally.

Kafka can be seen as a Publisher/Subscribe messaging system (although mainly being a streaming platform). It has the great benefit that more than one single Consumer can read the same messages of a topic. Compared to other messaging systems the data is not deleted after acknowledged by a consumer.

Otherwise, when I connect to Kafka again, I will download the same messages again. So I have to manage duplicate avoidance.

Kafka has the concept of "Offsets" and "ConsumerGroups" and I highly recommend to get familiar with them as they are essential when working with Kafka. Each consumer is part of a ConsumerGroup and each message in a topic has a unique identifer called "offset". An offset is like a unique identifer that stays with the same message for its life-time.

Each ConsumerGroup keeps track of the messages (offsets) that it already consumed. Now, if you do not want to read the same messages again your ConsumerGroup just have to commit those offsets and it will not read them again.

That way you will not consume duplicates, but still other consumers (with a differen ConsumerGroup) are able to read all messages again.

Michael Heil
  • 16,250
  • 3
  • 42
  • 77