Duplicate offsets in a Kafka topic with more than one partition

Question

I am using kafka_2.10-0.10.0.1 with zookeeper-3.4.10. I know that there are many types of offsets. I have two questions: - I want to know the type of the offset returned by ConsumerRecord.offset(). - If I use a topic created with 10 partitions, can I obtain a set of records with the same offset value? In my program, I need to obtain a list of records with different offset values. I want to know do I have to use a topic with a single partition to achieve this goal?

I read that there are offsets stored in Zookeeper and others stored in Kafka broker... — DaliMidou, Feb 02 '18 at 22:10
If you're using a client that's not pre 0.10 (your version of Kafka) it should store its offsets in Kafka. See https://stackoverflow.com/questions/41137281/offsets-stored-in-zookeeper-or-kafka. Anyway, your client should be storing its offsets in either ZK or Kafka. It would be bad if it was both. — Dmitry Minkovsky, Feb 02 '18 at 22:13
There are actually three types of offsets. See https://stackoverflow.com/questions/27499277/number-of-commits-and-offset-in-each-partition-of-a-kafka-topic — DaliMidou, Feb 02 '18 at 22:14
Oh in that sense. I’ve never heard those called types of offsets. Thanks for the link. — Dmitry Minkovsky, Feb 02 '18 at 22:17

score 1 · Answer 1 · answered Feb 02 '18 at 22:23

1

I want to know the type of the offset returned by ConsumerRecord.offset().

This is the offset of the record within the topic-partition the record came from.

If I use a topic created with 10 partitions, can I obtain a set of records with the same offset value?

Yes, you can seek to that offset in each partition and read the value. To do this, assign the topic-partitions you want to your consumer with Consumer#assign(), then use Consumer#seek() to see to the offset you want to read. When you poll(), the consumer will start reading from that offset.

I want to know do I have to use a topic with a single partition to achieve this goal?

You don't have to do this. You can read whatever offsets you want from whatever partitions you want.

answered Feb 02 '18 at 22:23

Dmitry Minkovsky

36,185
26
116
160

I do not want to obtain duplicated offsets, that's why I I think to use a topic with a single partition as a solution. – DaliMidou Feb 02 '18 at 22:30
Each partition has offsets 0 ... infinity. So you'll have offset 123 in each of your partitions. They are not duplicates, because it's actually offset 0-123 (for partition 0), 1-123 (partition 1), etc. – Dmitry Minkovsky Feb 02 '18 at 22:31
I reformulate my question: do the offsets start at zero in each partition or continue the offsets of the previous partition. – DaliMidou Feb 02 '18 at 22:34
Yes, each partition starts with 0. Partitions should be thought of as "parallel". There are no previous or later partitions. – Dmitry Minkovsky Feb 02 '18 at 22:35
The fact that partitions are parallel is what enables parallel processing and distribution in Kafka. – Dmitry Minkovsky Feb 02 '18 at 22:35
1

Oh I just understood the phrasing of your original question. Sorry for the confusion. Yes, each partition has its own offsets from 0, but they’re not duplicates. – Dmitry Minkovsky Feb 02 '18 at 22:55
If I use one producer which writes in a topic (created with 10 partitions) and a single consumer which reads from it, partitions will be processed in parallel? does the producer writes in partitions in parallel or consecutively (when a partition is full, it uses anothe one)? – DaliMidou Feb 05 '18 at 15:57
It depends on how you configure the consumer. If it is the only consumer in a consumer group, it will read from all the partitions. Likewise, it depends on how you use the producer. The partition can be explicitly selected, or you can specify a key for the message and the key will be hashed and a partition will be chosen based on the hash. Partitions don't become "full". – Dmitry Minkovsky Feb 05 '18 at 16:00
Check out https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html and https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html and https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/producer/ProducerRecord.html – Dmitry Minkovsky Feb 05 '18 at 16:01
I put "log.cleaner.enable=false" that's why I think partition may be become "full". In my program, I have one producer and a single consumer which writes and reads from a single topic. The performance is much better when I create the topic with 10 partition than the case when I created with a single partition. I want to understand the reason. – DaliMidou Feb 05 '18 at 16:08

Duplicate offsets in a Kafka topic with more than one partition

1 Answers1