12

I was wondering about what types of data we could have in Kafka topics. As I know in application level this is a key-value pairs and this could be the data of type which is supported by the language. For example we send some messages to the topic, could it be some json, parquet files, serialized data or we operate with the messages only like with the plain text format?

Thanks for you help.

Fateax
  • 203
  • 1
  • 2
  • 11

2 Answers2

11

There are various message formats depending on if you are talking about the APIs, the wire protocol, or the on disk storage.

Some of these Kafka Message formats are described in the docs here

https://kafka.apache.org/documentation/#messageformat

Kafka has the concept of a Serializer/Deserializer or SerDes (pronounced Sir-Deez).

https://en.m.wikipedia.org/wiki/SerDes

A Serializer is a function that can take any message and converts it into the byte array that is actually sent on the wire using the Kafka Protocol.

A Deserializer does the opposite, it reads the raw message bytes portion of the Kafka wire protocol and re-creates a message as you want the receiving application to see it.

There are built-in SerDes libraries for Strings, Long, ByteArrays, ByteBuffers and a wealth of community SerDes libraries for JSON, ProtoBuf, Avro, as well as application specific message formats.

You can build your own SerDes libraries as well see the following

How to create Custom serializer in kafka?

Hans Jespersen
  • 8,024
  • 1
  • 24
  • 31
3

On the topic it's always just serialised data. Serialisation happens in the producer before sending and deserialisation in the consumer after fetching. Serializers and deserializers are pluggable, so as you said at application level it's key value pairs of any data type you want.

Michal Borowiecki
  • 4,244
  • 1
  • 11
  • 18