-1

I have created a topic in Kafka with partition count 3 now in all these three partitions I want to push unique messages. Is there any way to do it? I checked producer.send pushes duplicate messages on all partitions.

For testing I am using following code:

from kafka import KafkaProducer
from kafka.errors import KafkaError

producer = KafkaProducer(bootstrap_servers=['localhost:9092'])

# Asynchronous by default
future = producer.send('my-topic', b'raw_bytes')

But it is sending duplicate messages on partitions.

Avinash
  • 2,093
  • 4
  • 28
  • 41

2 Answers2

0

Add a key to your messages. Kafka's default partitioner will ensure all messages with duplicate keys will go to the same partition. You can use an md5 hash of the message value as the message key.

Hans Jespersen
  • 8,024
  • 1
  • 24
  • 31
0

From https://kafka-python.readthedocs.io/en/master/apidoc/KafkaProducer.html#kafka.KafkaProducer.send:

future = producer.send(topic='my-topic', value= b'raw_bytes', key=None, partition=None, timestamp_ms=None)

So you can manually assign the destination partition yourself, although this is not recommended because what if you need to expand your topic with additional partitions? You don't want to have to update your code...

Or you can specify custom keys. A md5 sum should make for a relatively equal distribution, you can see how to create that in this answer: https://stackoverflow.com/a/5297483/770425

Jeff Widman
  • 22,014
  • 12
  • 72
  • 88