I am using kafka_2.10-0.10.0.1
with zookeeper-3.4.10
. I know that there are many types of offsets. I have two questions:
- I want to know the type of the offset returned by ConsumerRecord.offset().
- If I use a topic created with 10 partitions, can I obtain a set of records with the same offset value? In my program, I need to obtain a list of records with different offset values. I want to know do I have to use a topic with a single partition to achieve this goal?
Asked
Active
Viewed 2,272 times
0

DaliMidou
- 111
- 1
- 3
- 14
-
Many types of offsets? What do you mean? – Dmitry Minkovsky Feb 02 '18 at 21:52
-
I read that there are offsets stored in Zookeeper and others stored in Kafka broker... – DaliMidou Feb 02 '18 at 22:10
-
If you're using a client that's not pre 0.10 (your version of Kafka) it should store its offsets in Kafka. See https://stackoverflow.com/questions/41137281/offsets-stored-in-zookeeper-or-kafka. Anyway, your client should be storing its offsets in either ZK or Kafka. It would be bad if it was both. – Dmitry Minkovsky Feb 02 '18 at 22:13
-
There are actually three types of offsets. See https://stackoverflow.com/questions/27499277/number-of-commits-and-offset-in-each-partition-of-a-kafka-topic – DaliMidou Feb 02 '18 at 22:14
-
Do you have an answer for my second question please? – DaliMidou Feb 02 '18 at 22:15
-
Oh in that sense. I’ve never heard those called types of offsets. Thanks for the link. – Dmitry Minkovsky Feb 02 '18 at 22:17
1 Answers
1
I want to know the type of the offset returned by
ConsumerRecord.offset()
.
This is the offset of the record within the topic-partition the record came from.
If I use a topic created with 10 partitions, can I obtain a set of records with the same offset value?
Yes, you can seek to that offset in each partition and read the value. To do this, assign the topic-partitions you want to your consumer with Consumer#assign()
, then use Consumer#seek()
to see to the offset you want to read. When you poll()
, the consumer will start reading from that offset.
I want to know do I have to use a topic with a single partition to achieve this goal?
You don't have to do this. You can read whatever offsets you want from whatever partitions you want.

Dmitry Minkovsky
- 36,185
- 26
- 116
- 160
-
I do not want to obtain duplicated offsets, that's why I I think to use a topic with a single partition as a solution. – DaliMidou Feb 02 '18 at 22:30
-
Each partition has offsets 0 ... infinity. So you'll have offset 123 in each of your partitions. They are not duplicates, because it's actually offset 0-123 (for partition 0), 1-123 (partition 1), etc. – Dmitry Minkovsky Feb 02 '18 at 22:31
-
I reformulate my question: do the offsets start at zero in each partition or continue the offsets of the previous partition. – DaliMidou Feb 02 '18 at 22:34
-
Yes, each partition starts with 0. Partitions should be thought of as "parallel". There are no previous or later partitions. – Dmitry Minkovsky Feb 02 '18 at 22:35
-
The fact that partitions are parallel is what enables parallel processing and distribution in Kafka. – Dmitry Minkovsky Feb 02 '18 at 22:35
-
1Oh I just understood the phrasing of your original question. Sorry for the confusion. Yes, each partition has its own offsets from 0, but they’re not duplicates. – Dmitry Minkovsky Feb 02 '18 at 22:55
-
If I use one producer which writes in a topic (created with 10 partitions) and a single consumer which reads from it, partitions will be processed in parallel? does the producer writes in partitions in parallel or consecutively (when a partition is full, it uses anothe one)? – DaliMidou Feb 05 '18 at 15:57
-
It depends on how you configure the consumer. If it is the only consumer in a consumer group, it will read from all the partitions. Likewise, it depends on how you use the producer. The partition can be explicitly selected, or you can specify a key for the message and the key will be hashed and a partition will be chosen based on the hash. Partitions don't become "full". – Dmitry Minkovsky Feb 05 '18 at 16:00
-
Check out https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html and https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html and https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/producer/ProducerRecord.html – Dmitry Minkovsky Feb 05 '18 at 16:01
-
I put "log.cleaner.enable=false" that's why I think partition may be become "full". In my program, I have one producer and a single consumer which writes and reads from a single topic. The performance is much better when I create the topic with 10 partition than the case when I created with a single partition. I want to understand the reason. – DaliMidou Feb 05 '18 at 16:08