1

I have two Avro schemas, one contains several union fields, and the union type is ["null", "string"]. The other schema does not have any union fields.
And I have POJO classes representing the mentioned two schemas. POJOs were generated by the avro-tools-1.11.0.jar.

I followed the below approach to transform the JSON into an Avro object(the one that does not contain any union fields)

Decoder decoder = DecoderFactory.get().jsonDecoder(EtmKey.getClassSchema(), "The JSON Input!");
SpecificDatumReader<EtmKey> reader = new SpecificDatumReader<>(EtmKey.getClassSchema());
EtmKey etmKeyDatum = reader.read(null, decoder);
System.out.println("EtmKey topic: " + etmKeyDatum.toString());

EtmKey is the Avro schema representation class.

Using the above code, I was able to successfully generate the Avro object without having any issues. The library I used is org.apache.avro.

But the same library cannot be used to generate the Avro object when it has union fields. It throws Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING. Tried manually converting the field values like this "field_5": {"string": "0MI8C..."} according to the answer in here but no luck. Manually converting is not an option anyway.

Example JSON payload,

   {
    "field_1": "Apple",
    "field_2": "123",
    "field_3": "001-123",
    "field_4": "TR501",
    "field_5": "0MI8...",
    "field_6": "0010y...",
    "field_7": "2022-12-02T22:21:19.000+0000",
    "field_8": "john.doe",
    "field_9": "005E00."
   }

Please note that JSON payload field names are the same as the ones in the Avro class.

Here, I would like to get some insight on how to generate an Avro Object that has Union Fields using JSON input. The solution should be solid and it is required to use an official plugin/dependency or reputed library like Jackson, Gson. Appreciate any resources/code examples and suggestions.

Prasad
  • 83
  • 1
  • 8
  • Does this answer your question? [How to fix Expected start-union. Got VALUE\_NUMBER\_INT when converting JSON to Avro on the command line?](https://stackoverflow.com/questions/27485580/how-to-fix-expected-start-union-got-value-number-int-when-converting-json-to-av) – tgdavies Dec 17 '22 at 11:00
  • @tgdavies thank you for the reply. I tried that answer but didn't work out for me and is not the ideal solution if it worked. Looking to do that dynamically. Also, I mentioned the same in the question as well. – Prasad Dec 17 '22 at 15:54
  • Union types need to be objects with the type of the union to serialize into. You simply won't be able to use that JSON as-is. Besides, if you're using Jackson, why aren't you using ObjectMapper along with your POJO? As in, use Jackson JSON mapper to parse into POJO (shouldn't care about Avro schemas, only Java fields), then you have your SpecificRecord type. Also, numbered fields is an anti pattern. Perhaps you should be using a map/list instead – OneCricketeer Dec 18 '22 at 15:01
  • @OneCricketeer I used the ObjectMapper with POJO but for some reason, the output object is empty(empty fields). Need to dig into that. Also, I used these numbered fields for example purposes, in the actual implementation, it has meaningful names. Thank you for the reply! – Prasad Dec 30 '22 at 06:53

1 Answers1

0

Solved the issue by implementing the following code,

public static <T> T getAvroRecord(Schema schema, ExtraPayload extraPayload) throws IOException {
    ReflectDatumWriter datumWriter = new ReflectDatumWriter(schema);
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

    BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);
    datumWriter.write(extraPayload, encoder);
    encoder.flush();

    DatumReader datumReader = new GenericDatumReader(schema);
    BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(outputStream.toByteArray(), null);

    GenericRecord genericRecord = (GenericRecord) datumReader.read(null, decoder);
    return (T) SpecificData.get().deepCopy(schema, genericRecord);
}

Call the method and assign the output to the required class(entity).

Etm etm = getAvroRecord(Etm.getClassSchema(), sourceData);

Etm is an Avro POJO class generated using avro-tools

Prasad
  • 83
  • 1
  • 8