1

What class/method in Kafka Streams can we use to serialize/deserialize Java object to byte array OR vice versa? The following link proposes the usage of ByteArrayOutputStream & ObjectOutputStream but they are not thread safe.

Send Custom Java Objects to Kafka Topic

There is another option to use the ObjectMapper, ObjectReader (for threadsafe), but that's converting from POJO -> JSON -> bytearray. Seems this option is an extensive one. Wanted to check if there is a direct way to translate object into bytearray and vice versa which is threadsafe. Please suggest

import org.apache.kafka.common.serialization.Serializer;
public class HouseSerializer<T> implements Serializer<T>{
    private Class<T> tClass;
    public HouseSerializer(){

    }

    @SuppressWarnings("unchecked")
    @Override
    public void configure(Map configs, boolean isKey) {
        tClass = (Class<T>) configs.get("POJOClass");       
    }

    @Override
    public void close() {
    }

    @Override
    public byte[] serialize(String topic, T data) {
        //Object serialization to be performed here
        return null;
    }
}


Note: Kafka version - 0.10.1

Raman
  • 665
  • 1
  • 15
  • 38

1 Answers1

2

Wanted to check if there is a direct way to translate object into bytearray

I would suggest you look at using Avro serialization with the Confluent Schema Registry, if possible, but not required. JSON is a good fall back, but takes more space "on the wire", and so MsgPack would be the alternative there.

See Avro code example here

Above example is using the avro-maven-plugin to generate a LogLine class from the src/main/resources/avro schema file.


Otherwise, it's up to you for how to serialize your object into a byte array, for example, a String is commonly packed as

[(length of string) (UTF8 encoded bytes)]

While booleans are a single 0 or 1 bit

which is threadsafe

I understand the concern, but you aren't commonly sharing deserialized data between threads. You send/read/process a message for each independent one.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Thanks for the above response. I understand that each message by itself would be a separate instance of Kafka object, but in order to cast it to a Java object and then serialize/deserialize we would potentially run into the racing conditions. I was hoping if there is a way to convert the data value to a byte array within the above class method "serialize" – Raman May 17 '18 at 18:49
  • Avro/JSON does that for you and can be used across many different languages rather than just "default Java Object serialization". You don't need to go to a byte array yourself. Just use the StringSerializer if using JSON – OneCricketeer May 17 '18 at 18:56