2

In my kafka cluster single partition topic i have a simple consumer processing all incoming messages, in case of error about data processed i want to reprocess in the same order all message from a certain offset (not the beginning) to fix the inconsistency and keep the original ordered sequence of message from kafka.

Is there a way to do it in with Pykafka? i'm not figuring it out

Giuseppe
  • 363
  • 5
  • 19

1 Answers1

5

You need to call reset_offsets(). For example:

consumer = topic.get_simple_consumer(consumer_group="example")
partition_offset_pairs = [(p, get_offset_for_partition(p)) for p in consumer.partitions.itervalues()]
# because we passed in a consumer_group the new offsets will be saved in Kafka
consumer.reset_offsets(partition_offsets=partition_offset_pairs)

(where get_offset_for_partition() is a function you define). Or for a single-partition topic:

# read from offset 123456
consumer = topic.get_simple_consumer()
partition = topic.partitions[0]
consumer.reset_offsets([(partition, 123456)])

The same reset_offsets() method is also available on BalancedConsumer & ManagedBalanceConsumer classes too.

Note that as part of Kafka's design, messages are only guaranteed in-order for each topic partition independently.

rcoup
  • 5,372
  • 2
  • 32
  • 36