0

I want to implement avro serializer/deserializer for Kafka producer/consumer. There can be multiple scenarios

  1. Writer schema and reader schema are same, will never change. In such scenario, no need to send avro schema along with payload. At consumer we can use reader schema itself to deserialise payload. Sample implementation is provided in this post
  2. Using schema resolution feature when schema will evolve over time. So avro can still deserialize different reader and writer schema using schema resolution rules. So we need to send avro scehma along with payload

My Question How to send schema as well while producing, so that deserialiser read whole bytes and separate out actual payload and schema ? I am using avro generated class. Note, I don't want to use schema registry.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Pintu
  • 112
  • 9

1 Answers1

1

You need a reader and writer schema, in any Avro use-case, even if they are the same. SpecificDatumWriter (for serializer) and SpecificDatumReader (for deserializer) both take a schema.

You could use Kafka record headers to encode the AVSC string, and send along with the payload, but keep in mind that Kafka records/batches have an upper-bound in allowed size. Using some Schema Registry (doesn't have to be Confluent's), reduces the overhead from a whole string to a simple integer ID.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • 1
    I found another post, where you suggested similar approach https://stackoverflow.com/a/66801044/2653389 , I can use that. One query on "You need a reader and writer schema, in any Avro use-case, even if they are the same" - consumer will have avro generated class or schema string(which must be shared by producer as a contract), why would we need a reader schema in such case ? Also there is single parameter constructor of type `Schema` in `GenericDatumReader` – Pintu Jan 25 '23 at 19:14
  • If you aren't required to use Avro, then Protobuf, CapNProto, or JSON Smile might also work – OneCricketeer Jan 25 '23 at 19:18
  • 1
    For the generated class, there is a schema part of that class. That's what is used for both reader and writer, but the writer can still be overwritten – OneCricketeer Jan 25 '23 at 19:18
  • I was thinking of a scenario where writer schema won't evolve or get overwritten (may be an optimistic case) :-) . Yes I meant, we can use "schema part of that class" as both reader and writer. And no need to explicitly send reader schema along with data – Pintu Jan 25 '23 at 19:23