0

I want to time the time taken by Kafka to serialize different data formats. And have a doubt whether I can do it on my end(since I think this is done on the Kafka side.) If yes how can we do it?Is the serialisation done after the message.send()? Else I was also checking for Kafka monitoring metrics available and did not find anything related to this in their documentation either. Had seen the request-latency-avg as a possible metric but its values seem too high to be just the serialisation part. Could anybody suggest anything for the same.

humble_me
  • 331
  • 3
  • 12

2 Answers2

0

Kafka has built-in Serializer and Deserializers for a number of formats, such as Strings, Long, ByteArrays, ByteBuffers and the community has for JSON, ProtoBuf, Avro.

If your focus is performance for serialization and deserialization you can check the result of some benchmark: https://labs.criteo.com/2017/05/serialization/

where the author concluded:

Protobuf and Thrift have similar performances, in terms of file sizes and serialization/deserialization time. The slightly better performances of Thrift did not outweigh the easier and less risky integration of Protobuf as it was already in use in our systems, thus the final choice. Protobuf also has a better documentation, whereas Thrift lacks it. Luckily there was the missing guide that helped us implement Thrift quickly for benchmarking.

https://diwakergupta.github.io/thrift-missing-guide/#_types Avro should not be used if your objects are small. But it looks interesting for its speed if you have very big objects and don’t have complex data structures as they are difficult to express. Avro tools also look more targeted at the Java world than cross-language development. The C# implementation’s bugs and limitations are quite frustrating.

  • Hey, thanks a lot for your answer. However could you confirm that the Kafka serialisation is done after we do the message.send() and since I have to check some custom formats different from those mentioned in the above blog, are you aware of how I can do this testing on my own in a manner similar to the blog. – humble_me Nov 28 '19 at 13:04
  • Without knowing the library you are using, it is hard to say but kafka itself doesn't care about your message encoding. So in most case you have to send it encoded on your format.To evaluate the formats, you should do it isolated from other factors such as network, and kafka. To evaluate your kafka solution maybe you can simply create a producer that sends messages in parallel and a consumer and measure the throughput, then change format and repeat. –  Nov 28 '19 at 13:19
0

Kafka doesn't have any API to identify performance number on serializer/deserializer and there is no matter to find in case you are using basic serializer/deserializer. Really you are interested you can build custom serializer/deserializer and try to get the number there.

You can refer already answered below link for custom serializer/deserializer Custom serializer/deserializer

Nitin
  • 3,533
  • 2
  • 26
  • 36