5

Given example schema contains a field which is union of null and string,

Schema

    {
  "type":"record",
  "name":"DataFlowEntity",
  "namespace":"org.sdf.manage.commons.server",
  "fields":
  [
    {"name":"dataTypeGroupName","type":["null","string"]},
    {"name":"dataTypeName","type":"string"},
    {"name":"dataSchemaVersion","type":"string"}
  ]
}

I want to convert following json object,

Object

{
  "dataTypeGroupName": "dg_1",
  "dataTypeName": "dt_1",
  "dataSchemaVersion": "1"
}

into an avro object corresponding to above schema. I tried with Avro's JsonDecoder with code snppet described below,

    String dataFlowEntity = "{\"dataTypeGroupName\": \"dg_1\", \"dataTypeName\": \"dt_1\", \"dataSchemaVersion\": \"1\"}";
    Schema schema = DataFlowEntity.SCHEMA$;
    InputStream inputStream = new ByteArrayInputStream(dataFlowEntity.getBytes());
    DataInputStream dInputStream = new DataInputStream(inputStream);
    Decoder decoder = DecoderFactory.get().jsonDecoder(schema, dInputStream);
    DatumReader<DataFlowEntity> datumReader = new GenericDatumReader<DataFlowEntity>(schema);
    DataFlowEntity dataFlowEntityObject = DataFlowEntity.newBuilder().build();
    dataFlowEntityObject = datumReader.read(null, decoder);

It fails with exception,

threw exception [org.apache.avro.AvroRuntimeException: org.apache.avro.AvroRuntimeException: Field dataTypeGroupName type:UNION pos:0 not set and has no default value] with root cause
org.apache.avro.AvroRuntimeException: Field dataTypeGroupName type:UNION pos:0 not set and has no default value
  at org.apache.avro.generic.GenericData.getDefaultValue(GenericData.java:874)
  at org.apache.avro.data.RecordBuilderBase.defaultValue(RecordBuilderBase.java:135)
Mac
  • 413
  • 1
  • 5
  • 15
  • Possible duplicate of [How to fix Expected start-union. Got VALUE\_NUMBER\_INT when converting JSON to Avro on the command line?](http://stackoverflow.com/questions/27485580/how-to-fix-expected-start-union-got-value-number-int-when-converting-json-to-av) – Afshin Moazami Dec 15 '16 at 19:01

3 Answers3

2

If using node.js is an option, you can use avsc to do the conversion for you. Calling clone with wrapUnions set will automatically wrap values into the first union branch they match.

Using your example:

var avsc = require('avsc');

var type =  avsc.parse({
  "type":"record",
  "name":"DataFlowEntity",
  "namespace":"org.sdf.manage.commons.server",
  "fields": [
    {"name":"dataTypeGroupName","type":["null","string"]},
    {"name":"dataTypeName","type":"string"},
    {"name":"dataSchemaVersion","type":"string"}
  ]
}, {wrapUnions: true});

var invalidRecord = {
  "dataTypeGroupName": "dg_1",
  "dataTypeName": "dt_1",
  "dataSchemaVersion": "1"
};

var validRecord = type.clone(invalidRecord, {wrapUnions: true});
// == {
//   "dataTypeGroupName":{"string":"dg_1"},
//   "dataTypeName":"dt_1",
//   "dataSchemaVersion":"1"
// }
mtth
  • 4,671
  • 3
  • 30
  • 36
1

Check this project out: https://github.com/allegro/hermes/pull/749/files

You are interested in the JsonAvroConverter. It de-serializes from json (without union types) to Avro generated objects (that have union types). Actually, it gets from the schema of types on the union and tries them one by one. It works excellent in our case.

This is doing the job: https://github.com/allegro/json-avro-converter/blob/master/converter/src/main/java/tech/allegro/schema/json2avro/converter/JsonGenericRecordReader.java

Regards!

Vassilis
  • 914
  • 8
  • 23
0

There is a new JSON encoder in the works that should address this common issue:

https://issues.apache.org/jira/browse/AVRO-1582

https://github.com/zolyfarkas/avro

This seems to be a common issue that lots of people run into when dealing with Avro.

If you switch your JSON to this it should work:

{
  "dataTypeGroupName": {"string" : "dg_1"},
  "dataTypeName": "dt_1",
  "dataSchemaVersion": "1"
}

This is because Avro encodes unions with a object type wrapper, unfortunately, even simple unions to represent the optional type which don't need a JSON object wrapper to disambiguate. Avro's intent never seemed to be to generate friendly JSON, more so to use JSON as a serialization format.

For more details: https://avro.apache.org/docs/1.7.7/spec.html#json_encoding

ppearcy
  • 2,732
  • 19
  • 21
  • It works fine if union contains primitive data types. But how make it work for union containing user defined data types like record. – Mac Nov 16 '15 at 07:00