6

As per the definition of "default" attribute in Avro docs: "A default value for this field, used when reading instances that lack this field (optional)."

This means that if the corresponding field is missing, the default value is taken.

But this does not seem to be the case. Consider the following student schema:

{
        "type": "record",
        "namespace": "com.example",
        "name": "Student",
        "fields": [{
                "name": "age",
                "type": "int",
                "default": -1
            },
            {
                "name": "name",
                "type": "string",
                "default": "null"
            }
        ]
    }

Schema says that: if "age" field is missing, then consider value as -1. Likewise for "name" field.

Now, if I try to construct Student model, from the following JSON:

{"age":70}

I get this exception:

org.apache.avro.AvroTypeException: Expected string. Got END_OBJECT

    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:698)
    at org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:227)

Looks like the default is NOT working as expected. So, What exactly is the role of default here ?

This is the code used to generate Student model:

Decoder decoder = DecoderFactory.get().jsonDecoder(Student.SCHEMA$, studentJson);
SpecificDatumReader<Student> datumReader = new SpecificDatumReader<>(Student.class);
return datumReader.read(null, decoder);

(Student class is auto-generated by Avro compiler from student schema)

Pavan
  • 711
  • 2
  • 6
  • 17
  • Possible duplicate of [Avro field default values](https://stackoverflow.com/questions/22938124/avro-field-default-values) – Ali Akbarpour Feb 26 '18 at 10:25
  • @Generic there is little difference. There Model is built using builder and having default works. While only during parsing Json string it fails. Few articles pointed out that fields cannot go missing, which I felt unjustified. If at all we have to have field, then I do not understand how default attribute will help. – Pavan Feb 26 '18 at 10:40

2 Answers2

3

I think there is some miss understanding around default values so hopefully my explanation will help to other people as well. The default value is useful to give a default value when the field is not present, but this is essentially when you are instancing an avro object (in your case calling datumReader.read) but it does not allow read data with a different schema, this is why the concept of "schema registry" is useful for this kind of situations.

The following code works and allow read your data

Decoder decoder = DecoderFactory.get().jsonDecoder(Student.SCHEMA$, "{\"age\":70}");
SpecificDatumReader<Student> datumReader = new SpecificDatumReader<>(Student.class);

Schema expected = new Schema.Parser().parse("{\n" +
        "  \"type\": \"record\",\n" +
        "  \"namespace\": \"com.example\",\n" +
        "  \"name\": \"Student\",\n" +
        "  \"fields\": [{\n" +
        "    \"name\": \"age\",\n" +
        "    \"type\": \"int\",\n" +
        "    \"default\": -1\n" +
        "  }\n" +
        "  ]\n" +
        "}");

datumReader.setSchema(expected);
System.out.println(datumReader.read(null, decoder));

as you can see, I am specifying the schema used to "write" the json input which does not contain the field "name", however (considering your schema contains a default value) when you print the records you will see the name with your default value

{"age": 70, "name": "null"}

Just in case, might or might not already know, that "null" is not really a null value is a string with value "null".

hlagos
  • 7,690
  • 3
  • 23
  • 41
  • `datumReader.setSchema(expected)` works for missing field. But unfortunately when input json does contains "name" field, it still set the value to "null". ie if input is `{"age": 70, "name": "john"}`, I will get Student model with name set to "null". Where as I expecting that to be set to "john". Is there no other way to workaround these missing fields ?? – Pavan Feb 27 '18 at 05:17
  • 2 options. send the writer schema as part of the message (expensive) or use schema registry – hlagos Feb 28 '18 at 23:35
1

Just to add what is already said in above answer. in order for a field to be null if not present. then union its type with null. otherwise its just a string which is spelled as null that gets in.example schema:

{
"name": "name",
"type": [
  "null",
  "string"
],
"default": null

}

and then if you add {"age":70} and retrieve the record, you will get below:

{"age":70,"name":null}
arvin_v_s
  • 1,036
  • 1
  • 12
  • 18