36

I am using the Python high level consumer for Kafka and want to know the latest offsets for each partition of a topic. However I cannot get it to work.

from kafka import TopicPartition
from kafka.consumer import KafkaConsumer

con = KafkaConsumer(bootstrap_servers = brokers)
ps = [TopicPartition(topic, p) for p in con.partitions_for_topic(topic)]

con.assign(ps)
for p in ps:
    print "For partition %s highwater is %s"%(p.partition,con.highwater(p))

print "Subscription = %s"%con.subscription()
print "con.seek_to_beginning() = %s"%con.seek_to_beginning()

But the output I get is

For partition 0 highwater is None
For partition 1 highwater is None
For partition 2 highwater is None
For partition 3 highwater is None
For partition 4 highwater is None
For partition 5 highwater is None
....
For partition 96 highwater is None
For partition 97 highwater is None
For partition 98 highwater is None
For partition 99 highwater is None
Subscription = None
con.seek_to_beginning() = None
con.seek_to_end() = None

I have an alternate approach using assign but the result is the same

con = KafkaConsumer(bootstrap_servers = brokers)
ps = [TopicPartition(topic, p) for p in con.partitions_for_topic(topic)]

con.assign(ps)
for p in ps:
    print "For partition %s highwater is %s"%(p.partition,con.highwater(p))

print "Subscription = %s"%con.subscription()
print "con.seek_to_beginning() = %s"%con.seek_to_beginning()
print "con.seek_to_end() = %s"%con.seek_to_end()

It seems from some of the documentation that I might get this behaviour if a fetch has not been issued. But I cannot find a way to force that. What am I doing wrong?

Or is there a different/simpler way to get the latest offsets for a topic?

Saket
  • 3,079
  • 3
  • 29
  • 48
  • Not 100% positive, but I think your code is returning the value of highwater before `kafka-python` has actually connected to the broker. Since `KafkaConsumer` is async, I think you have to actually consume a message for the highwater value to be populated: https://github.com/dpkp/kafka-python/issues/509#issuecomment-178114516 – Jeff Widman Jan 25 '17 at 22:22

8 Answers8

36

Finally after spending a day on this and several false starts, I was able to find a solution and get it working. Posting it her so that others may refer to it.

from kafka import SimpleClient
from kafka.protocol.offset import OffsetRequest, OffsetResetStrategy
from kafka.common import OffsetRequestPayload

client = SimpleClient(brokers)

partitions = client.topic_partitions[topic]
offset_requests = [OffsetRequestPayload(topic, p, -1, 1) for p in partitions.keys()]

offsets_responses = client.send_offset_request(offset_requests)

for r in offsets_responses:
    print "partition = %s, offset = %s"%(r.partition, r.offsets[0])
Jeff Widman
  • 22,014
  • 12
  • 72
  • 88
Saket
  • 3,079
  • 3
  • 29
  • 48
  • 1
    Is there a way to get the current/next offset per consumer/group per partition? – GreenThumb Jun 28 '17 at 05:19
  • 4
    Sadly, the SimpleClient has been deprecated, and the offsets_responses above yields a FailedPayloadsError: FailedPayloadsError – dreynold Feb 12 '18 at 23:45
  • 1
    @dreynold it worked for me, but Itamar Lavender's answer using the non-deprecated parts below works too. If you don't have a group yet, skip the "lag" part and that works as well. – exic Oct 09 '18 at 12:16
28

If you wish to use Kafka shell scripts present in kafka/bin, then you can get latest and smallest offsets by using kafka-run-class.sh.

To get latest offset command will look like this

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --time -1 --topic topiname

To get smallest offset command will look like this

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --time -2 --topic topiname

You can find more information on Get Offsets Shell from following link

Hope this helps!

Saket
  • 3,079
  • 3
  • 29
  • 48
avr
  • 4,835
  • 1
  • 19
  • 30
20
from kafka import KafkaConsumer, TopicPartition

TOPIC = 'MYTOPIC'
GROUP = 'MYGROUP'
BOOTSTRAP_SERVERS = ['kafka01:9092', 'kafka02:9092']

consumer = KafkaConsumer(
        bootstrap_servers=BOOTSTRAP_SERVERS,
        group_id=GROUP,
        enable_auto_commit=False
    )


for p in consumer.partitions_for_topic(TOPIC):
    tp = TopicPartition(TOPIC, p)
    consumer.assign([tp])
    committed = consumer.committed(tp)
    consumer.seek_to_end(tp)
    last_offset = consumer.position(tp)
    print("topic: %s partition: %s committed: %s last: %s lag: %s" % (TOPIC, p, committed, last_offset, (last_offset - committed)))

consumer.close(autocommit=False)
Itamar Lavender
  • 959
  • 7
  • 20
  • 1
    As I see this question still drags attention I wanted to explain while my answer above doesn't really answer the question as to my opinion the last offset of a topic/partition is only relevant in a context of a consumer group. kafka is built for many consumer groups consuming same data from same topics, all I find important is the rate of consumption from a group or more important the lag. – Itamar Lavender Nov 22 '19 at 12:02
11

With kafka-python>=1.3.4 you can use:

kafka.KafkaConsumer.end_offsets(partitions)

Get the last offset for the given partitions. The last offset of a partition is the offset of the upcoming message, i.e. the offset of the last available message + 1.

from kafka import TopicPartition
from kafka.consumer import KafkaConsumer

con = KafkaConsumer(bootstrap_servers = brokers)
ps = [TopicPartition(topic, p) for p in con.partitions_for_topic(topic)]

con.end_offsets(ps)
a.costa
  • 1,029
  • 1
  • 9
  • 19
4

Another way to achieve this is by polling the consumer to obtain the last consumed offset and then using the seek_to_end method to obtain the most recent available offset partition.

from kafka import KafkaConsumer
consumer = KafkaConsumer('my-topic',
                     group_id='my-group',
                     bootstrap_servers=['localhost:9092'])
consumer.poll()
consumer.seek_to_end()

This method particularly comes in handy when using consumer groups.

SOURCES:

  1. https://kafka-python.readthedocs.io/en/master/apidoc/kafka.consumer.html#kafka.consumer.KafkaConsumer.poll
  2. https://kafka-python.readthedocs.io/en/master/apidoc/kafka.consumer.html#kafka.consumer.KafkaConsumer.seek_to_end
olujedai
  • 45
  • 2
  • 6
  • My server has hundreds of messages, yet consumer.poll() returned {} – Nick Feb 09 '18 at 22:45
  • 1
    This could happen if you are running more consumer instances than there are partitions for that topic. – olujedai May 16 '18 at 14:19
  • 1
    Good point. I was able to after the fact determine we weren't calling .close, so that very circumstance occurred, but we thought there was only 1. – Nick May 16 '18 at 18:03
3

Using confluent-kafka-python

You can use position:

Retrieve current positions (offsets) for the list of partitions.

from confluent_kafka import Consumer, TopicPartition


consumer = Consumer({"bootstrap.servers": "localhost:9092"})
topic = consumer.list_topics(topic='topicName')
partitions = [TopicPartition('topicName', partition) for partition in list(topic.topics['topicName'].partitions.keys())] 

offset_per_partition = consumer.position(partitions)

Alternatively, you can also use get_watermark_offsets but you'd have to pass one partition at a time and thus it requires multiple calls:

Retrieve low and high offsets for partition.

from confluent_kafka import Consumer, TopicPartition


consumer = Consumer({"bootstrap.servers": "localhost:9092"})
topic = consumer.list_topics(topic='topicName')
partitions = [TopicPartition('topicName', partition) for partition in list(topic.topics['topicName'].partitions.keys())] 

for p in partitions:
    low_offset, high_offset = consumer.get_watermark_offsets(p)
    print(f"Latest offset for partition {p}: {high_offset}")

Using kafka-python

You can use end_offsets:

Get the last offset for the given partitions. The last offset of a partition is the offset of the upcoming message, i.e. the offset of the last available message + 1.

This method does not change the current consumer position of the partitions.

from kafka import TopicPartition
from kafka.consumer import KafkaConsumer


consumer = KafkaConsumer(bootstrap_servers = "localhost:9092" )
partitions= = [TopicPartition('myTopic', p) for p in consumer.partitions_for_topic('myTopic')]
last_offset_per_partition = consumer.end_offsets(partitions)
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
0

kafka-consumer-groups --bootstrap-server host1:9093,crow-host2:9093,host3:9093 --command-config=/root/client.properties --describe --group atlas

This command will show the status. Lag/offset

0

Using kafka-python

While defining the consumer, argument auto_offset_reset can be set either to 'earliest' or 'latest'. This is useful incase consumer starts after the retention period and/or restarts after breaking down, messages will be consumed as per auto.offset.reset configuration

from kafka import KafkaConsumer
consumer = KafkaConsumer(
    'my-topic',
     bootstrap_servers=['localhost:9092'],
     auto_offset_reset='latest',
     enable_auto_commit=True,
     group_id='my-group',
     value_deserializer=lambda x: loads(x.decode('utf-8')))

see this example.

shafee
  • 15,566
  • 3
  • 19
  • 47
martand
  • 13
  • 4