1

I have a system that sends some data to Kafka broker using Avro format. Looks like Confluent platform is best suited to receive, consume and process it further - however it's huge and heavy.

I'm trying to use generic Apache Kafka and write my own consumer code in python.

Is it really possible? All samples of code that I found relay on Confluent, especially they require schema registry that is not present in original Apache Kafka.

lenenen
  • 11
  • 1
  • 2
  • "Huge and heavy"? Schema Registry is just a web app. You already have Apache Kafka, Confluent doesn't have anything different. Kafka Connect is already in your system too. You don't seem like you need the REST Proxy... So what other OSS components are you considering "heavy"? – OneCricketeer Jan 17 '20 at 01:04
  • In fact, not using the registry actually makes your Avro messages take more space in each message and therefore more network bandwidth – OneCricketeer Jan 17 '20 at 01:08
  • I don't think this ticket should of been closed as a duplicate, think the question was how to use a simple kafka producer in python as opposed to using an avro producer. The answer on the linked question uses a schema I think op just wants to know how to send a raw kafka message the same way console producer would only in python. – AnonymousAlias Jan 17 '20 at 14:55

1 Answers1

1

You may find some inspiration here: How to decode/deserialize Kafka Avro strings with Python

Because Avro is nothing more than serialization and Kafka messages are simply byte arrays You can always grab the bytes downstream and use your producer's or create your own Avro schema to parse the message.

Note - Confluent's KafkaAvroSerializer prepends a empty magic byte and a schema ID (assigned by Confluent's Schema Registry) to each message. You'll need to ignore that byte and the ID if KafkaAvroSerializers are a part of your data pipeline.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245